A Common AST for Multi-Language Analysis on Submitty
As computer science becomes an increasingly popular field of study, introductory classes grow accordingly. This increases the burden on professors, teaching assistants, and graders to give personalized feedback to each student. Submitty addresses this problem through an open source homework server. Typically, assignments are graded based on the output of assignment specific test cases. However, black box testing provides no insight into the student’s source code. This is problematic as the student may hard code adversarial outputs to trick the grader. Additionally, even if we assume the students are acting with good intent, black box testing makes it difficult to provide constructive feedback. Thus, there is a need for static analysis on a broad range of assignments in a multitude of courses.
We propose a language independent static analysis framework for automated grading of homework assignments on Submitty. The framework consists of an intermediate representation that captures the structural similarity of different programming languages. We refer to this abstract IR as the Common AST. Although our framework is highly language independent, we implement and evaluate on a codebase that includes C++ and Python. Consider the following python code segment and corresponding AST:
p = 0
q = 0
for i in range(0,10):
p += 1
for j in range(0, 10):
q += 1
The same code re-written in C++ produces the following AST: From this example, it is clear that although the programming structures across different programming languages are similar, the ASTs are highly language dependent and quite unique. In order to run a language independent static analysis, we create an AST that only includes features that exist in all of the programming languages that we are analyzing. The Common AST consists of the intersection of the nodes in the python AST and the C++ AST: Our multi-language analysis provides a consistent interface for instructors to run static analysis on student code regardless of source code language. However, the Common AST is an abstraction of complete AST. This causes our analysis to lose information that may have been present in the original concrete AST. However, we found that most student code does not contain languagae unique constructs. In over 2,000 Computer Science 1 (Rensselaer Polytechnic Institute CS 1100) assignments, only 5.4% of the nodes were not covered by the Common AST. Thus, our framework tends to cover a large majority of the constructs in the original AST.
In summary, we provide a valid alternative to languaage dependent tools that is appealing to developers working in a multilingual or agile, quickly changing environment.
TL;DR: A lanaguage independent Intermediate Representation (IR) to statically analyze homework assignments on Submitty
Website: Submitty
Publication: Program Analysis Tools in Automated Grading of Homework Assignments