This workshop is an introduction to identifying syntactical elements using QL by implementing a basic Andersen style points-to analysis1.
A points-to analysis is a pointer analysis that statically determines the set of object a variable can point to and provides information that can be used to improve the accuracy of other analyses such as computing a call-graph.
For example, consider the following Java snippet. How do we determine which foo method is called?
To answer the question we need to know to what object a points to, which can be an instance of A or B depending on how bar is called.
class A {
public void foo() {}
}
class B extends A {
@Override
public void foo() {}
}
class C {
public void bar(A a) {
a.foo();
}
}In this workshop the focus will be on syntactically describing elements and not the points-to algorithm. We will learn how to describe:
- How objects are created.
- How methods are called.
- How variables are accessed and assigned a value.
- How fields are accessed, assigned a value, and how to get the qualifier of a field access.
- How to syntactically matchup method arguments with method parameters.
- How to reason about statements in a method.
We will follow the declarative formulation of the basic points-to analysis as found in the book Pointer Analysis2
To determine the objects where variables can point to, we first need to describe what an object is.
During runtime a program can allocate multiple objects using the new ... expression.
Since we are statically analyzing a program we need to approximate object allocation and represent objects by allocation size, that is the program location where an object is allocated using a new ... expression.
Write a query to find all allocation size by looking for the new ... expression.
To test your query run the test exercises-tests/exercise1/Exercise1.qlref via the Visual Studio Code test explorer or with the command codeql test run exercises-tests/exercise1 in a terminal in the root of
Hints
To discover how a program element such as an expression is represented in QL you can use to approaches.- Use the AST viewer to find the element's QL class. You can create a database out of a test case by running
cp -R exercises-tests/exercise1/exercise1.testproj/ exercise1-db && codeql database bundle -o exercise1-db.zip exercise1-db/and select the databaseexercise1-db.zipto use the AST viewer. - Write a query that finds all the values of type
Elementrestricted to a region of code using the element's Location, retrieved withgetLocation(), and obtain its primary QL classes withgetPrimaryQlClassesto find the QL class that best describes the element.
With the allocation site of objects identified we can implement the first step in identifying points-to information.
Implement the alloc predicate that associates the creation of an object with a variable.
The varPointsTo predicate will use this as a first step to determine where variables point to.
Hints
- The class
Variable, of whichLocalScopeVariableis a subclass, supports the member predicategetAnAssignedValue - The class
Exprcontains the member predicategetEnclosingCallableto find theCallable, such as aMethod, the expression occurs in.
The next step is to propagate the points-to set when a variable is assigned a value hold by a variable.
Implement the predicate move to capture the pattern a = b where a and b are both variables.
Hints
- The class
Variable, of whichLocalScopeVariableis a subclass, supports the member predicategetAnAccessto determine where the variable is accessed.
At this point we are able to gather points-to information for variables. Java, and many other languages, supports fields to store information in variables associated with an object. We would like to capture points-to information that is field-sensitive, or in other words, can distinguish where each fields points to.
Implement the predicate store to capture a variable being stored in a field.
Hints
- A
Fieldis aVariableand supports the member predicategetAnAccessto determine where the field is accessed. - An alternative is the class
FieldAccessthat captures field access and supports the predicategetFieldto get theFieldbeing accessed. - A
Fieldis accessed through a qualifier, an expression that references the object the field belongs to. To get the qualifier use the predicategetQualifier. - The expression
AssignExprcaptures an assignment. Use the predicatesgetSrcandgetDestto reason about the constituents of the assignment.
Now that we known of to reason about how variables are stored in fields we want to reason about the reverse, how variables are getting their values from fields.
Implement the predicate load to describe the pattern a = b.c
With both stores and loads of fields we can start to reason about to what objects they point to.
Implement the fieldPointsTo predicate that associates the allocation sites of the qualifier and the variable being assigned to the field.
Hints
- To reason about allocation sites you can use the provided predicate
varPointsTo.
At this point we have established where fields can point to.
We want to propagate this information to variables being assigned a field.
Add a conjunction to the predicate varPointsTo that equates the allocation site for a variable to that of the field it is assigned.
Hints
- To reason about allocation sites for field you can use the predicate
fieldPointsTo. - To reason about variables being assigned a field you can use the predicate
load.
One use of points-to analysis is the construction of a call graph.
To do that we first have to identify method calls and the signature of the method they are calling so we can look up the method when we know where the qualifier points to.
Implement the predicate methodCall to identify method calls, their qualifiers, and signatures.
To get a signature of the method being called from a method call use the provided predicate getSignature.
The reason for the getSignature predicate is that CodeQL doesn't provide the signature information at the call site because it has already resolved the called method during extraction.
Hints
- To reason about method calls you can use the class
MethodAccess. - To reason about method call qualifiers you can use the member predicate
getQualifierprovided by theMethodAccessclass.
To associate method calls with methods we need to be able to lookup methods by their class and their signature.
Implement the predicate getMethod that given a Class and a signature returns a method.
Note: this exercise is purely educational. The standard library provides a more accurate implementation to resolve methods that is accessible from the MethodAccess class with the member predicate getMethod.
Hints
- To reason about a method's signature you can use the member predicate
getSignature. - A method belongs to a reference type, such as a class or interface. To obtain the type declaring the method you can use the member predicate
getDeclaringType.
Now that we can reason about method calls and lookup methods we can construct a call graph using the points-to information of the qualifier of the method call.
Implement the predicate callGraph that for a method call looks up the corresponding method based on the point-so information of the qualifier.
Hints
- To reason about a variable and a use of the variable you can use the member predicate
getAnAccesson a variable or use the classVariableAccessin combination with the member predicategetVariable.
So far we have only reasoned about points-to information intraprocedurally. That is, within a function and not across function boundaries through, for example, calls.
Implement the interproceduralAssign predicate to associate the points-to information of a method call argument to the called method's parameter.
Hints
- To reason about a method call's argument you can use the member predicate
getArgumentorgetAnArgument. - To reason about a method's parameter you can use the member predicate
getParameterorgetAParameter. - Java passes arguments by position.
In the previous exercise we have propagated points-to information from the caller to the callee.
What remains is to propagate points-to information from the callee to the caller that can happen through a return statement.
Add a disjunction to the interproceduralAssign predicate to associate points-to information from the result of a method call to an assignment of a variable.
Hints
-
The
Assignmentclass, representing assignment expressionsx = y, has the member predicatesgetDestandgetSrcto reason about its operands. -
To reason about the statements in a method you can use the
Method's member predicategetBodyto get the method's block statement{...}and theBlockStmts member predicategetAStmt. -
The class
ReturnStmtcan be used to reason aboutreturn ...statements. It's member predicategetResultprovides the expression that is returned. -
QL supports [https://codeql.github.com/docs/ql-language-reference/expressions/#casts] to constrain the type of an expression. For example:
import java from BlockStmt s, Expr result where result = s.getAStmt().(ReturnStmt).getResult() select result