Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get some trouble on using taint analysis #124

Closed
f4nx1ng opened this issue Oct 21, 2024 · 21 comments
Closed

Get some trouble on using taint analysis #124

f4nx1ng opened this issue Oct 21, 2024 · 21 comments
Assignees

Comments

@f4nx1ng
Copy link

f4nx1ng commented Oct 21, 2024

📝 Overall Description

When I used the framework for taint analysis, I found that the framework did not perform taint propagation well. Specifically, when I tried to detect the URLDNS deserialization gadget chain, Tai-e did not achieve the expected effect. So I want to know if this is a bug

🎯 Expected Behavior

The result should be a taint flow path from <java.util.HashMap: void readObject(java.io.ObjectInputStream)> to <java.net.URLStreamHandler: int hashCode(java.net.URL)>

🐛 Current Behavior

Tai-e report nothing to me

🔄 Reproducible Example

issue-package.zip

⚙️ Tai-e Arguments

🔍 Click here to see Tai-e Options
optionsFile: null
printHelp: false
classPath: null
appClassPath: java-benchmarks/urldns
mainClass: URLDNS
inputClasses: []
javaVersion: 8
prependJVM: false
allowPhantom: true
worldBuilderClass: pascal.taie.frontend.soot.SootWorldBuilder
outputDir: output
preBuildIR: false
worldCacheMode: false
scope: APP
nativeModel: true
planFile: null
analyses:
#  ir-dumper: ;
pta: cs:ci;implicit-entries:false;distinguish-string-constants:null;reflection-inference:solar;merge-string-objects:false;merge-string-builders:false;merge-exception-objects:false;taint-config:config/unserialize/taint-config.yml;
onlyGenPlan: false
keepResult:
- $KEEP-ALL
🔍 Click here to see Tai-e Analysis Plan
- id: pta
options:
  cs: ci
  only-app: false
  implicit-entries: false
  distinguish-string-constants: null
  merge-string-objects: false
  merge-string-builders: false
  merge-exception-objects: false
  handle-invokedynamic: false
  propagate-types:
  - reference
  advanced: null
  dump: false
  dump-ci: false
  dump-yaml: false
  expected-file: null
  reflection-inference: solar
  reflection-log: null
  taint-config: config/unserialize/taint-config.yml
  plugins: []
  time-limit: -1

📜 Tai-e Log

🔍 Click here to see Tai-e Log
Writing log to D:\IDEA_Projects\Tai-e\output\tai-e.log
Writing analysis plan to D:\IDEA_Projects\Tai-e\output\tai-e-plan.yml
WorldBuilder starts ...
6702 classes with 64239 methods in the world
WorldBuilder finishes, elapsed time: 2.06s
pta starts ...
Loading taint config from D:\IDEA_Projects\Tai-e\config\unserialize\taint-config.yml
TaintConfig:
sources:
ParamSource{<java.util.HashMap: void readObject(java.io.ObjectInputStream)>/0(java.io.ObjectInputStream)}
CallSource{<java.net.URLStreamHandler: int hashCode(java.net.URL)>/0(java.net.URL)}

sinks:
<java.net.InetAddress: java.net.InetAddress getByName(java.lang.String)>/0
<java.net.URLStreamHandler: int hashCode(java.net.URL)>/base

transfers:
<java.io.ByteArrayInputStream: void <init>(byte[])>: 0 -> base(java.io.ByteArrayInputStream)
<java.io.ObjectInputStream: void <init>(java.io.InputStream)>: 0 -> base(java.io.ObjectInputStream)
<java.io.ObjectInputStream: java.lang.Object readObject()>: base -> result(java.lang.Object)
<java.net.URLStreamHandler: int hashCode(java.net.URL)>: 0 -> base(java.net.URLStreamHandler)

[Pointer analysis] elapsed time: 12.68s
Detected 0 taint flow(s):
TFGDumper starts ...
Source nodes:
VarNode{<java.util.HashMap: void readObject(java.io.ObjectInputStream)>/r0}
Sink nodes:
Dumping D:\IDEA_Projects\Tai-e\output\taint-flow-graph.dot
TFGDumper finishes, elapsed time: 0.27s
-------------- Pointer analysis statistics: --------------
#var pointers:                6,2704 (insens) / 6,2704 (sens)
#objects:                     9035 (insens) / 9035 (sens)
#var points-to:               833,8344 (insens) / 833,8344 (sens)
#static field points-to:      1935 (sens)
#instance field points-to:    282,3617 (sens)
#array points-to:             7,1176 (sens)
#reachable methods:           9201 (insens) / 9201 (sens)
#call graph edges:            5,2780 (insens) / 5,2780 (sens)
----------------------------------------
pta finishes, elapsed time: 14.74s

ℹ️ Additional Information

In addition to the above information, I also added a Tai-e analysis plugin to identify the entry, As shown below

public class UnserializeEntryPointHandler implements Plugin {
    private Solver solver;
    private String findclass = "java.util.HashMap";

    @Override
    public void setSolver(Solver solver) {
        this.solver = solver;
    }

    @Override
    public void onStart() {
        //add HashMap readObject to EntryPoint
        List<JClass> list = solver.getHierarchy().allClasses().toList();
        List<Type> paramType = new ArrayList<>();
        for(JClass jClass : list){
            if(jClass.getName().equals(findclass)){
                System.out.println("find class");
//                paramType.add(NullType.NULL);
                //Subsignature subsignature = Subsignature.get("readObject", paramType, new ClassType(jClass.getClassLoader(), "java.lang.Object"));
                JMethod jMethod = jClass.getDeclaredMethod("readObject");
                if(jMethod != null){
                    System.out.println("entry add");
                    solver.addEntryPoint(new EntryPoint(jMethod, new DeclaredParamProvider(jMethod, solver.getHeapModel())));
                }
            }
        }
    }
}
@f4nx1ng f4nx1ng added the type: bug A general bug label Oct 21, 2024
@f4nx1ng
Copy link
Author

f4nx1ng commented Oct 21, 2024

Tai-e Version is 6150e1f

@zhangt2333
Copy link
Member

What you've provided is the program being analyzed, not a reproducible example, so I can't delve into your issue. Could you provide a reproducible example?

Could you try the latest version of Tai-e? (i.e., v0.5.1-SNAPSHOT)

@f4nx1ng
Copy link
Author

f4nx1ng commented Oct 22, 2024

It seems that the issue section of GitHub cannot submit the entire Tai-e project because it is too large. Perhaps you can access all of my project files through the following link: https://pan.baidu.com/s/1fl8dMh9mh4E8BeAuUePerQ?pwd=gnbg
The config file I used in\config\unserialize\options.yml
Thanks for your reply, I will try the latest version at the same time

@f4nx1ng
Copy link
Author

f4nx1ng commented Oct 22, 2024

I ran my case again on the latest version of Tai-e. In fact, the latest version of Tai-e did not solve my problem. And this is not an error in class loading, I have added some new code logic to determine the value of java. net Has the URLStreamHandler class and its methods been truly loaded and analyzed? Tai-e's feedback is normal, which means that Tai-e loaded the class but did not propagate to the target method properly during taint analysis

@jjppp
Copy link
Member

jjppp commented Oct 22, 2024

Hi @f4nx1ng, just downloaded your project files and quickly examined the option files contained within. It seems that the project is not consistent with your comments above (e.g., config/unserialize/taint-config.yml contains different sources and sinks compared to what you pasted here). Could you please provide a better minimal working example (e.g., by creating a fork from Tai-e, commiting your changes to Tai-e, and sharing the repo)?

@f4nx1ng
Copy link
Author

f4nx1ng commented Oct 22, 2024

Sorry, the content of the taint-config.yml file is inconsistent with the previous log. I hope you can debug the problem more easily at that time, so I commented out some irrelevant or wrong content for you. If this caused you misunderstanding, I am very sorry, but the rest of the part still truly reflects the problem I encountered at that time.

@f4nx1ng
Copy link
Author

f4nx1ng commented Oct 22, 2024

And now I have forked a copy of Tai-e in my repository. Thanks for your advice.

@f4nx1ng
Copy link
Author

f4nx1ng commented Oct 22, 2024

@f4nx1ng
Copy link
Author

f4nx1ng commented Oct 23, 2024

Perhaps I know where the problem lies. I have found that from the perspective of pointer analysis, this variable related to UrlStreamHandler points to null. Because this UrlStreamHandler class is an abstract class, it does not have any related new statements. Is there any handling of abstract class function calls in Tai-e?

@jjppp
Copy link
Member

jjppp commented Oct 23, 2024

Hi @f4nx1ng, tried to understand how the URLDNS gadget works by digging into the the dynamic execution of your example, please correct me if I got it wrong:

  1. URLDNS.class is a java8 application
  2. URL.hashCode() will delegate to URL.handler.hashCode(URL)
  3. Initialization to instance field URL.handler is done via static method URL.getURLStreamHandler(String)
  4. URL.getURLStreamHandler(String) calls ClassLoader.loadClass(PROTOCOL_NAME + ".Handler").newInstance() to obtain the real handler
  5. The handler is then used to compute the hashCode of the url object

And yes, URLStreamHandler is an abstract class, but it is initialized via reflection APIs. The results of the pointer analysis did miss the points-to-set of the field URL.handler, and I'm working to understand why it was missing.

@f4nx1ng
Copy link
Author

f4nx1ng commented Oct 23, 2024

Thanks for your reply. In fact, your understanding of the working principle of URLDNS example is correct. I am currently looking for additional solutions on Tai-e so that I can initialize the variable PointerToSet next time I encounter a similar problem. (e.g., when the content of pts is empty, add a required obj/csobj to this pts

@jjppp
Copy link
Member

jjppp commented Oct 23, 2024

As of the question of handler pointing to null, the reason is that Tai-e currently cannot analyze the target methods of reflective calls with complex arguments (e.g., string concatenation with variable prefixes), as reflection is innately a hard feature. An example Append.java taken from URL.getURLStreamHandler(String) is given below to illustrate this issue.

To make the best use of Tai-e in the presence of reflection methods, try:

  1. Set distinguish-string-constants:all;reflection-inference:solar;merge-string-objects:false
  2. Provide reflection log via option reflection-log. Reflection logs can be obtained by running your application to be analyzed with Tamiflex
  3. Write a taint transfer explicitly

You can try out this example yourself with/without the reflection logs provided.

public class Append {
    public static void main(String[] args) throws Exception {
        String var5 = "sun.net.www.protocol";
        String var0 = "http";
        Class cls = Class.forName(var5 + "." + var0 + ".Handler");
        Object o = cls.newInstance();
    }
}

without reflection logs:

<Append: void main(java.lang.String[])>/java.lang.Class.forName/0	<java.lang.Class: java.lang.Class forName(java.lang.String)>
<Append: void main(java.lang.String[])>/java.lang.Class.newInstance/0	<java.lang.Class: java.lang.Object newInstance()>

with reflection logs provided:

<Append: void main(java.lang.String[])>/java.lang.Class.forName/0	<java.lang.Class: java.lang.Class forName(java.lang.String)>
<Append: void main(java.lang.String[])>/java.lang.Class.newInstance/0	<java.lang.Class: java.lang.Object newInstance()>
<Append: void main(java.lang.String[])>/java.lang.Class.newInstance/0	<sun.net.www.protocol.http.Handler: void <init>()>

And the reflection logs generated by Tamiflex:

Class.forName;sun.net.www.protocol.http.Handler;Append.main;5;;
Class.getDeclaredField;<java.lang.invoke.MethodHandle: java.lang.invoke.LambdaForm form>;java.lang.invoke.MethodHandle.<clinit>;1451;isAccessible=false;
Class.getMethod;<Append: void main(java.lang.String[])>;sun.launcher.LauncherHelper.validateMainClass;670;isAccessible=false;
Class.newInstance;sun.net.www.protocol.http.Handler;Append.main;6;;
Constructor.getModifiers;<sun.net.www.protocol.http.Handler: void <init>()>;java.lang.Class.newInstance;;isAccessible=true;
Constructor.newInstance;<sun.net.www.protocol.http.Handler: void <init>()>;java.lang.Class.newInstance;;isAccessible=true;
Field.getName;<java.lang.invoke.MethodHandle: java.lang.invoke.LambdaForm form>;java.lang.Class.searchFields;;isAccessible=false;
Field.getName;<java.lang.invoke.MethodHandle: java.lang.invoke.MethodType type>;java.lang.Class.searchFields;;isAccessible=false;
Method.getModifiers;<Append: void main(java.lang.String[])>;sun.launcher.LauncherHelper.validateMainClass;683;isAccessible=false;
Method.getName;<Append: void main(java.lang.String[])>;java.lang.Class.searchMethods;;isAccessible=false;

@f4nx1ng
Copy link
Author

f4nx1ng commented Oct 23, 2024

Oh, thank you for your reply. Although this method is already very good, I still hope there is a more automated way to solve this problem. So I added the following code to the plugin:

    public void onPhaseFinish() {
        solver.getCallGraph().reachableMethods().forEach(csMethod -> {
            if (csMethod.getMethod().getDeclaringClass().getName().equals("java.net.URL")){
                csMethod.getMethod().getIR().getStmts().forEach(stmt1 -> {
                    if(stmt1 instanceof Invoke invoke && (invoke.isVirtual() || invoke.isInterface()) && invoke.getRValue() instanceof InvokeInstanceExp invokeInstanceExp){
                        Var var = invokeInstanceExp.getBase();
                        Context context = csMethod.getContext();
                        if (solver.getCSManager().getCSVar(context, var).getPointsToSet() == null || solver.getCSManager().getCSVar(context, var).getPointsToSet().isEmpty()){
                            JClass jclass = World.get().getClassHierarchy().getClass(var.getType().getName());
                            Collection<JClass> implementors = new ArrayList<>();
                            if(invoke.isInterface()){
                                implementors.addAll(World.get().getClassHierarchy().getDirectImplementorsOf(jclass));
                            }else {
                                implementors.add(jclass);
                                implementors.addAll(World.get().getClassHierarchy().getDirectSubclassesOf(jclass));
                            }
                            System.out.printf("%s %s %s %s\n", csMethod.getMethod().getName(), var, jclass, implementors);
                            implementors.forEach(implementor ->{
                                solver.addPointsTo(solver.getCSManager().getCSVar(csMethod.getContext(), var), csMethod.getContext(), solver.getHeapModel().getMockObj(()->"Unserialzie", implementor.getName(), implementor.getType()));
                            });
                        }
                    }
                });
            }
        });
    }

the code above can provide PointerToSet for URLStreamHandler. But it still can't correctly perform taint analysis

@jjppp
Copy link
Member

jjppp commented Oct 23, 2024

Seems like some points-to-sets are still missing caused by reflection APIs. An invocation of ObjectInputStream.readObject() in the main method gives the following call chain:

  1. ObjectInputStream.readObject()
  2. ObjectInputStream.readObject(Class<?>)
  3. ObjectInputStream.readObject(Class<?>, boolean)
  4. ObjectInputStream.readOrdinaryObject(boolean)
  5. readSerialData(Object, ObjectStreamClass)
  6. invokeReadObject(Object, ObjectInputStream)

in the method invokeReadObject(Object, ObjectInputStream), an instance field ObjectStreamClass.readObjectMethod is invoked, which is previously obtained by reflection API ObjectStreamClass.getPrivateMethod(var1, "readObject", new Class[]{ObjectInputStream.class}, Void.TYPE).

If you check the pta-results.txt out, you will find that all the instances of class ObjectStreamClass have null instance field readObjectMethod.

This is, again, a reflection related issue caused by this notorious java feature. Currently Tai-e employs a heuristic to balances between precision and soundness by simplifying the handle of reflection APIs inside libraries (i.e., outside the Application). Again, you may need to provide a reflection log according to the documentation to reproduce the dynamic behavior of the application, or write some taint transfers to guide Tai-e to discover some of the missing function calls and objects.

Also, it maybe helpful if you could explain how this taint flow really works (e.g., by providing a comprehensive concrete execution path), as it seem more like a misuse of Tai-e instead of a bug.

@f4nx1ng
Copy link
Author

f4nx1ng commented Oct 24, 2024

In fact, I have tried to avoid the problem you mentioned when I first wrote the code. You can see the addition of an additional entry method in my plugin implementation, which is used to add the HashMap.readObject() method, thus avoiding the problem you mentioned. The problem you mentioned will not affect the subsequent taint propagation, because the analysis starts from within the JDK. You can see more details in my plgin: https://github.com/f4nx1ng/Tai-e/blob/master/src/main/java/pascal/taie/analysis/pta/plugin/UnserializeEntryPointHandler.java

@f4nx1ng
Copy link
Author

f4nx1ng commented Oct 24, 2024

After debugging for a while, I have some information to share.

  1. My goal is to find a path from <java.util.HashMap: void readObject(java.io.ObjectInputStream)> to <java.net.URL: int hashCode()> and finally to <java.net.URLStreamHandler: int hashCode(java.net.URL)>. Tai-e can only find the half path from <java.util.HashMap: void readObject(java.io.ObjectInputStream)> to <java.net.URL: int hashCode()>. This is because URLStreamHandler points to null.
  2. In fact, the URLDNS class I provided is a failure case, and the final entry does not depend on it.
  3. I put the entry processing in the OnStart() function of UnserializeEntryPointHandler, which is used to add HashMap.readObject() to the entry. At the same time, I implemented the onPhaseFinish() method in this plugin, which added PointerToSet to URLStreamHandler. At this time, the URLStreamHandler pointer is no longer null (this has been verified)
  4. Even after doing the above operations, the taint analysis still did not find the corresponding path. I am currently looking for the problem in the processCall function.

@f4nx1ng
Copy link
Author

f4nx1ng commented Oct 24, 2024

Oh, I found the problem. This is due to multiple type conversions in the gadget chain. Since this type is unpredictable, it will cause the tainted type to not be parsed normally, causing the resolvecall function to call the wrong method.

@f4nx1ng
Copy link
Author

f4nx1ng commented Oct 24, 2024

Thank you very much for your help during this period

@f4nx1ng f4nx1ng closed this as completed Oct 24, 2024
@zhangt2333 zhangt2333 added type: question and removed type: bug A general bug labels Oct 24, 2024
@xiziyunqi105
Copy link

xiziyunqi105 commented Nov 28, 2024

hello @f4nx1ng What does 2024/11/1 question 1 refer to here?i replace the SourceHandler.java and the problem(#129) was solved. thank you
02dabcc#diff-2c1a6bf9a017ef48994ac3d9f0b6ff3c5b65b96470b532247d33bf53bad59478R207

@f4nx1ng
Copy link
Author

f4nx1ng commented Nov 29, 2024

I am very happy that my code can help you solve the problem, but in fact this code should not be made public due to some academic original issues. This is due to my negligence in warehouse management. However, this crude code does help you solve the relevant problems, which makes me hesitate whether to make this source code public.
As for the problem I mentioned in the comments on 11/1, as you can see, my solution is committed to solving the problem of mining Java deserialization vulnerabilities. The additional mechanism I added to SourceHandler is "source self-pollution". Specifically, in the deserialization vulnerability, the entry is usually similar to java.util.PriorityQueue: void readObject(java.io.ObjectInputStream). In Tai-e, we can only set the taint object of the param type, that is, set the taint for java.io.ObjectInputStream ois, but in fact, for the deserialization vulnerability, the tainted data already exists from the object represented by the java.util.PriorityQueue class (%this object of readObject()), but Tai-e does not provide us with such a method, so I used a certain method to improve my needs.
This code is just a basic test version, and more codes or papers about this research may be made public in the future. If you like it, you can star this repo

@xiziyunqi105
Copy link

I am very happy that my code can help you solve the problem, but in fact this code should not be made public due to some academic original issues. This is due to my negligence in warehouse management. However, this crude code does help you solve the relevant problems, which makes me hesitate whether to make this source code public. As for the problem I mentioned in the comments on 11/1, as you can see, my solution is committed to solving the problem of mining Java deserialization vulnerabilities. The additional mechanism I added to SourceHandler is "source self-pollution". Specifically, in the deserialization vulnerability, the entry is usually similar to java.util.PriorityQueue: void readObject(java.io.ObjectInputStream). In Tai-e, we can only set the taint object of the param type, that is, set the taint for java.io.ObjectInputStream ois, but in fact, for the deserialization vulnerability, the tainted data already exists from the object represented by the java.util.PriorityQueue class (%this object of readObject()), but Tai-e does not provide us with such a method, so I used a certain method to improve my needs. This code is just a basic test version, and more codes or papers about this research may be made public in the future. If you like it, you can star this repo

Thanks for your reply, I see what you mean, the source here is an object(#129), maybe there are similar reasons.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants