### The Problem At Georgetown University’s teaching center (CNDLS), learners and faculty can request help for classroom technology through a “Ring our Doorbell” button on the website. These requests are often urgent: a professor about to start class might need help with Canvas, or a learner might be struggling to upload a paper to Turnitin. The challenge? While waiting for a human support staff member, users often remain stuck. In many cases, the answers already exist online in FAQs or help articles but digging for them in the moment is slow and frustrating. We set out to **reduce this lag time to zero** by creating an **AI chatbot** that could provide instant, reliable guidance while users waited for a human follow-up. --- ### Our Approach We designed a **retrieval-based chatbot**: instead of trying to “generate” answers from scratch, the bot searches through existing help documentation and retrieves the most relevant response. This design fits perfectly with our context, since the majority of user queries were limited to a small set of platforms: **Canvas, Zoom, and Turnitin**. #### Step 1: Collecting the Data We started by scraping the official FAQ and instructor guides for each platform: - [Canvas Instructor Guide](https://community.canvaslms.com/t5/Instructor-Guide/tkb-p/Instructor) - [Zoom FAQ](https://uis.georgetown.edu/zoom/faq/) - [Turnitin Instructor FAQ](https://www.turnitin.com/help_pages/instructor_faq.asp) Using **Python** libraries like `BeautifulSoup` and `requests`, we pulled questions and answers into a structured dataset in CSV format:`(question, answer, source link)` This gave us a knowledge base of query–answer pairs to work with. --- #### Step 2: Building the Search Engine We needed a way to quickly match a user’s query with the right answer from our dataset. To do this, we built a search engine that: 1. **Processes the query** → tokenizes the input and prepares it for comparison. 2. **Searches the inverted index** → finds all documents containing the query terms. 3. **Ranks results with TF-IDF** → uses **Term Frequency–Inverse Document Frequency** to score how relevant each document is. 4. **Returns the top results** → the highest-scoring answer(s) are retrieved. If multiple documents matched, we used **cosine similarity** to compare the query vector to each document vector, ensuring the most semantically relevant answers came first. We also created a **test dataset** of 20 query/document pairs and measured performance using standard search engine metrics: - **Precision** – proportion of retrieved documents that were relevant. - **Recall** – proportion of relevant documents retrieved. - **F1 Score** – harmonic mean of precision and recall. - **MAP (Mean Average Precision)** – average performance across queries. --- #### Step 3: Making Answers User-Friendly While our search engine could pull the right content, the answers were often long and overwhelming, more like a manual than a quick fix. To solve this, we added a **summarization layer** powered by **GPT-3.5**. The workflow looked like this: The search engine retrieves the most relevant documents → The documents are passed into GPT-3.5 through the Chat Completions API → GPT condenses the material into a short, instructional answer. For example: <details> <summary><b>User query: </b></summary> “What is Canvas?” </details> <details> <summary><b>Raw search result:</b></summary> A long article on Canvas’ functionality, including login, course design, and roles. </details> <details> <summary><b>Chatbot output:</b></summary> “Canvas is a Learning Management System (LMS) accessed through a web browser. It doesn’t require installation, and your institution provides the login link. In Canvas, instructors can manage courses, assignments, and grades.” </details> >This transformation made the responses much more actionable. --- #### Step 4: Feedback Loop and Continuous Improvement A key part of our design was **learning from real users**. - The chatbot automatically **logged every query and the answer it provided**. - After each response, it asked the user for a **rating**. - If a user rated the answer poorly, the chatbot **emailed the query and its response to a human support staff member**. - A human then solved the issue, and their solution was **added back into the chatbot’s dataset**, linking the new query with a high-quality answer. This meant the chatbot **evolved over time**, solving more problems with each interaction and reducing the load on support staff even further. --- ### Results >[!Tip] **The result?** We showed that pairing **classic search techniques (TF-IDF, inverted index)** with **modern large language models (GPT-3.5)** and a **human-in-the-loop feedback system** can deliver **real-time, evolving tech support** that dramatically improves the user experience in higher education. - **Turnaround time reduced to 0 minutes**: Users receive instant assistance while waiting for a human response. - The chatbot effectively handles many common queries for Canvas, Zoom, and Turnitin. - The **feedback loop ensured continuous improvement** — with each low-rated answer, the bot learned and became more reliable. - Support staff now face fewer repetitive requests, allowing them to focus on more complex problems. - Our evaluation confirmed strong retrieval accuracy, though summarization quality varied. Some answers were spot-on; others were vague or occasionally nonsensical. --- ### What We Learned - Retrieval-based chatbots work extremely well in **domain-specific contexts** with a limited knowledge base. - Adding a **summarization layer** makes results more useful, but LLM performance was inconsistent at the time. - The **human-in-the-loop design** was critical: Instead of replacing support staff, the chatbot worked alongside them, gradually increasing its own capabilities. - **Evaluation is tricky**: While we measured retrieval accuracy, measuring the quality of summarization requires new approaches (user testing, ratings, etc.). --- ### Future Directions - Expand coverage beyond Canvas, Zoom, and Turnitin. - Deepen the feedback system with analytics dashboards for staff. - Develop a user feedback loop with ratings not just for bad answers, but also for good ones, to reinforce quality responses.