CreativeTrail | Devpost

Final grading output
text editor environment + debugger tools for during development stage
user prompted with a question.

Inspiration

Chris and I recall not enjoying having to go to after school math classes. But realized that today we thank our parents for putting us in it, even sometimes wishing we were more cooperative. We realized that many kids of all ages including college students have an incentive to cheat using AI as it usually results in a better grade and less effort, this hinders them long term and is unsustainable when it comes to exams. We decided to change the incentive by changing what we grade, instead of grading the final result we grade the process.

What it does

Our solution focuses solely on essay writing at the moment. A student answers an essay prompt in our text editor and while writing the essay throughout all its revision stages the student is prompting with thoughtful questions. The goal of these questions is to identify the students knowledge of their own paper and how they perform on 5 key indicators (Idea Generation, Depth of Analysis, Source integration, conceptual linking, intellectual risk taking).

How we built it

We built it using these main languages: python, TypeScript, SQL/prisma schema language, json/jsonl. Python is what is mainly used to train the supervised learning model it handles the classifier, feature extraction, routing logic, model inference, calibration, and bridge script.

The front end uses mainly TypeScript, runs app logics, calls the python bridge, handles prompting, logging, reporting, the debug UI, and API routes. JSON/JSONL is used a lot for configs, datasets, labels, model artifacts, runtime logs, and reports.

Challenges we ran into

The supervised learning took longer to train than expected + was inaccurate at first solve this by having a fall back and confidence scoring.
We started using heuristics but realized they aren't accurate enough to detect semantics.
Struggled defining the 5 categories/indicators and how/what type of questions to ask.
Working with AI and trying to avoid drift. We used spec files to keep it on track.
Question outputs repeating. we needed to make sure we covered all 5 categories with the questions asked to ensure fair grading. On the one hand we wanted the most relevant question on the stack to be asked on the other we didn't want questions repeated. We solved that by ignoring a question if its category had already been addressed.