Skip to content

Latest commit

 

History

History
42 lines (30 loc) · 6.35 KB

README.md

File metadata and controls

42 lines (30 loc) · 6.35 KB

Performance Profiling Guide for Python

Profilers help to identify performance problems. These are tools designed to give the metrics to find the slowest parts of the code so that we can optimize what really matters. Profilers can gather a wide variety of metrics: wall time, CPU time, network or memory consumption, I/O operations, etc.
Profilers can answer questions like,

  • How many times is each method in my code called?
  • How long does each of these methods take?
  • How much memory does the method consume?

There are different types of profilers:
  • Deterministic Profiling: Deterministic profilers execute trace functions at various points of interest (function call, function return) and record precise timings of these events. It means the code runs slower under profiling. Its use in production systems is often impractical.

  • Statistical profiling:  Instead of tracking every event (call to every function), statistical profilers interrupt applications periodically and collect samples of the execution state (call stack snapshots). The call stacks are then analyzed to determine the execution time of different parts of the application. This method is less accurate, but it also reduces the overhead.

All the profilers we are going to discuss here are Deterministic Profilers because they capture precise timings of events. Please note that the Memory Profiler package also has the mprof module that does statistical profiling. It is discussed briefly in Memory Profiler notebook.

This GitHub aims to show different profilers for Python and explain in detail the procedure to profile different workloads with different profilers. Below is the list of all the profilers we will be discussing. Each profiler has a separate folder with a Jupyter Notebook to guide you.

Performance Profiler Lines or Function Description of Profiler
Memory Profiler lines
  • It provides memory consumption of each individual line inside the function.
  • Minimal code modification is required.
  • It is generally used after identifying hotspot functions from a function profiler.
  • It does not profile GPU workloads.
  • It cannot profile individual threads.
  • It does not provide execution time information.
Line Profiler lines
  • It times the execution of each individual line inside the function.
  • No code modification is required.
  • It is generally used after identifying hotspot functions from a function profiler.
  • It does not profile GPU workloads.
  • It cannot profile individual threads.
  • It does not provide memory consumption information.
cProfile function
  • It times the execution of different functions.
  • No code modification is required.
  • It provides a call stack graph and execution time of functions that help identify hotspots.
  • It does not profile GPU workloads.
  • It cannot profile individual threads.
  • It does not provide memory consumption information.
Profile function
  • It times the execution of different functions.
  • No code modification is required.
  • It provides a call stack graph and execution time of functions that help identify hotspots.
  • It does not profile GPU workloads.
  • Unlike cProfile, it can profile individual threads but has more overhead compared to cProfile.
  • It does not provide memory consumption information
FunctionTrace function
  • It times the execution of different functions but only supports Python>3.5.
  • No code modification is required.
  • It provides stack charts, flame graphs, and call trees that help identify hotspots.
  • It does not profile GPU workloads.
  • It can profile individual threads.
  • It does not provide memory consumption information.
  • Profiling results can be shared very easily through browser.
Scalene function and line
  • It times the execution of different functions and lines but only supports Python>3.7.
  • No code modification is required.
  • It does not provide call stack information.
  • It can profile GPU workloads.
  • It can profile individual threads.
  • It provides memory consumption information.
  • Profiling results can be shared very easily through browser.
  • It has integration to GPT3, when activated it can suggest changes to optimize code
VTune function and line
  • It times the execution of different functions and lines and supports other languages like C, Java, etc.
  • Minimal code modification is required. It also provides a GUI that is easy to use.
  • It provides call stack information, flame graph, and hardware utilization.
  • It can profile GPU workloads.
  • It can profile individual threads.
  • It provides memory consumption information.
  • Profiling results can be shared very easily through web browser interface.
  • It also gives low-level C, C++ functions that can be potential hotspots.
  • The profiling overhead is high as compared to other profilers.

We will also use the following Intel AI Reference Kit in our profiling examples:

  • Scikit-Learn Intelligent Indexing for Incoming Correspondence – Ref Kit

image

Follow the steps mentioned in the intelligent-Indexing Ref Kit GitHub ReadMe to setup the environments accordingly.
The process involves

  • Setting up a virtual environment for both stock and Intel®-accelerated machine learning packages
  • Preprocessing data using Pandas*/Intel® Distribution of Modin and NLTK
  • Training an NLP model for text classification using Scikit-Learn*/Intel® Extension for Scikit-Learn*