
Essay Grading with
Artificial Intelligence
Product | Machine Learning | UX | Accessibility
The Product
Barbri Bar Review is a premier edtech resource in legal education, offering unlimited essay submissions, graded by professional attorneys to help students strengthen their writing skills and pass the Bar Exam.
The Problem
With courses lasting only 6–8 weeks and more than 80,000 essays submitted each season, the current grading process struggles to deliver fast, high-quality feedback—making it challenging for professional graders & admin to keep pace.
The Project
(1) Design a seamless, professional grading experience that empowers graders to efficiently manage and access ungraded essays by filtering and retrieving submissions based on jurisdiction, law school, or essay topic.
​
(2) Leverage artificial intelligence to enhance the evaluation process by assisting with essay rubric analysis.
​​
(3) Deliver meaningful feedback by providing students with actionable insights and scoring data to guide and mentor them throughout their Bar Exam preparation.
Tools | Figma, Adobe Suite, Miro, JIRA
Duration | May 2024 - August 2024
Team | Society Product Squad (1 Product Owner, 1 PM, 1 UX, 7-8 Devs) + ML/AI Data Team
My Role | Lead Product Designer
Background
Gathering Requirements
Creating Cohesion
Classifying Feedback
Working with the ML Team
Testing with Graders
Creating Confidence
Making Adjustments
Outcomes
The Process | TOC


Background
As part of BARBRI’s LMS modernization, I designed the grader-facing counterpart to the student essay app—streamlining the review process within a greater enterprise essay management system. Collaborating with a team of ML engineers, we integrated an AI-powered scoring tool to reduce grading time and manage the high volume of submissions during each Bar Review.
This solution was vital to upholding BARBRI’s promise of unlimited essay feedback and ensuring students received the prep they needed before the bar exam.
It was my responsibility to lead the 0-to-1 design of this emerging tech. The goal was to reduce grading time through AI-assisted scoring—ensuring every student receives timely, high-quality feedback in preparation for the bar exam.
Gathering Requirements
Prior to the start of this project segment, my Product Owner and I defined the core business requirements as part of a larger planning initiative for the full essay management system. We used Miro to map out system workflows, Figma to explore early design concepts, and Jira to align cross-functional efforts and track progress.


Creating Cohesion
To maintain a unified user experience across the LMS, I ensured consistent UX patterns between the student-facing and grader-facing essay tools. Though tailored to different user roles, both experiences live within the same application—students submit essays, and graders review and return feedback. By aligning design elements and interaction models, we reinforced cohesion across BARBRI products, improved usability, and reduced development time through reusable components.
Student-Essay UI
Essay AI-Grader UI


Classifying Feedback
To design an effective experience for both students and graders, we needed to clearly distinguish between two types of feedback:​
​
-
Quantitative (Graded): Rubric-based evaluations that contribute to the student’s score and performance metrics.
-
Qualitative (Instructional): Insightful, freeform comments from professional graders offering guidance or encouragement, without affecting the final score.
​
​Understanding this distinction directly shaped key design decisions. Visually separating the feedback types improved clarity and learning outcomes for students. Additionally, well-defined rubric structures were essential for the Machine Learning team to train the AI to accurately evaluate essays and assign scores based on grading criteria.
Working with the Machine Learning Team
Alongside designing the grader experience, I closely supported our Machine Learning team in understanding the nuances of essay feedback—particularly the quantitative aspects. Our shared goal was to leverage AI to accelerate the scoring process by automating rubric evaluation and generating final scores based on assignment-specific jurisdictional scales (6, 10, or 100-point systems).
​
The complexity of this challenge made AI integration essential. Each essay assignment could include up to 80+ rubrics, each with its own point value (ranging from 0 to 1+, including half-points). Points were assigned based on “response quality”—a scale evaluating the strength of the student's argument or answer, categorized as poor, needs improvement, or passing.​

To train the AI, we sourced and analyzed historical essay submissions with known scoring outcomes. These examples were carefully parsed and tagged by response quality to create a dataset that allowed the model to begin learning how to score with contextual accuracy. After several months of iteration and refinement, the model achieved a scoring accuracy of 65%, enabling us to begin validation testing with professional graders and administrative stakeholders and further train our algorithm.
Testing with Graders
Introducing this new platform required a shift in grader workflows. Previously, graders manually scored each rubric, tallied the total, and left it to students to interpret their jurisdictional scale. With the new system, graders instead reviewed AI-generated scores for accuracy—actively training the AI with every correction—while continuing to provide instructional feedback.
We conducted user testing with graders to evaluate whether the full scoring and feedback process could be completed in under 15 minutes per essay. Initial sessions averaged 20–30 minutes due to the learning curve of the new interface and the cognitive load of parsing numerous rubric items.
Creating Confidence
To address this, I introduced confidence scores on each rubric card, enabling graders to quickly identify and focus on low-confidence areas flagged by the AI. This helped reduce unnecessary review time for high-confidence scores and ensured that more attention was given to complex or uncertain evaluations.


Making Adjustments
Additionally, I implemented keyboard hotkeys for faster score entry, minimizing the need for mouse clicks and increasing overall efficiency. This combination of design optimizations helped graders return high-scoring essays more quickly, giving students more time to improve, while ensuring struggling students received the detailed feedback they needed.
Score Report (Self-Grade Only)
Score Report (Graded)



Check out the full experience below!
Notable Impacts & Outcomes
-
Successfully decreased the average grading time needed to less that 15 minutes per essay (avg. 12 mins).
-
Created a filterable, essay repository that enables retrieval and management of thousands of essay submissions.
-
Integrated the very first AI-supported tool into BARBRI Product Suite - establishing the framework for future machine learning projects.
-
​Targeted to intake and evaluate over 125 thousand student essays per season by Summer 2026 (BARBRI internal data).