Education Resource Search Engine and Chatbot
Take a second to imagine you're a school teacher or administrator. You have a question about what your school's tardy policy is and you remember it was updated last year. Your only choice is to go to the school's data folder and look through hundreds if not thousands of unorganized documents to try to find your answer. Let's not forget that you're also juggling teaching your classes, meeting with parents, organizing activities, the list goes on and on. There wasn’t a clean, simple way of getting questions answered and keeping up to date on school/district policies!
Our client, Education Delta, thanks to their tremendous experience in the education technology space, realizing this issue and opportunity came to us for help developing an AI knowledge system for school districts.
Together we realized a few key points:
1. Different staff members (admin vs. teacher vs. office) had access to different information.
It was critical for us to understand the user type and retrieve only the data pertinent to them, even if it covers the same topics. During a fire drill for example, different staff types would have different procedures they needed to follow. Giving a teacher an admin's responsibilities could have horrible consequences.
2. The documents (school handbooks, policy docs, etc.) vary wildly in format and readability.
For example, as a teacher, you may need to know what the policy is on bringing homemade treats for your students. It would be a safe assumption that the information you need is in the teacher’s handbook. When you finally find the snippet you need on page 574 of 890, it references a piece of legal educational code on store-bought food (not homemade) as well as a flyer that is circulated throughout the district on handling student allergies. Information that should be extremely clear and in one place tends to be fragmented into multiple datasets, locations, and alternate verbiage. This creates a cognitively harsh environment for finding even the simplest of information.
3. The data is overlapped - certain data present in older documents is replaced by passages in new documents.
We needed to ensure that when retrieving data to answer questions, we were considering the hierarchy of data. Our system at times would have to make the determination as to which data is more "correct" than the other.
Given our parameters, our team set out to build an AI system for Education Delta that would allow school staff to quickly get answers to important questions through efficient retrieval and data understanding.
The PressW team developed an AI system based around a RAG (Retrieval Augmented Generation) chatbot. RAG is a technique where based on the question being asked, pulls back only the most relevant specific data from the massive dataset we have, to answer. This technique helps mitigates hallucinations (more on that later) and creates a more accurate system. This was critical for an application like this where we were dealing with a large quantity of data and looking for pinpoint information throughout it. The system we created was fully managed and tailored to each school district. All each district has to do is upload all their documentation and then our data pipelines automatically turn the information into queryable data for our chatbot.
We tested out a few different techniques in order to create a reliable system that consistently and accurately gives responses for questions such as “Who do I contact for ordering new educational materials?” and “What is the football schedule this year?”. One of the many challenges that we faced was that the data we had and the user's questions don't match all of the time. An example here is if a teacher is asking about the football schedule, the actual document section we need doesn't have the words "football" or "schedule", just dates and the schools that are being played against. We were able to overcome this by utilizing the parent-document retrieval and query expansion techniques (teaser: blog post incoming!) which allowed us to increase the search area for information while keeping the focus narrow.
Trust and Hallucinations
A key constraint that we had for this project was to gain and maintain the trust of the school district staff and to avoid as many hallucinations as possible. In order to mitigate as many hallucinations as possible, we added guardrails to short-circuit the AI to using a different model and prompting structure if there were not any relevant documents retrieved from the system. This allows us to deterministically create responses to guide users to different data sources if we weren’t able to find an answer and to avoid hallucinating responses to questions, even if the response may be generally correct.
Initial Results
The system launched on August 6th with a pilot of 700 users within a local school district in Texas. The feedback was overwhelmingly positive from all of the beta testers and staff that are making use of the beta. It’s also being trialed with state agencies and other school districts around the state with all positive feedback from the initial client! Over 1600 questions were answered within the first two weeks of the system going into beta with the largest category of user queries being various school-related schedules, deadlines, events, and compensation details.