-
Notifications
You must be signed in to change notification settings - Fork 1.2k
LogAnalyzer
The scope of SONiC LogAnalyzer is to be able to analyze log content produced by SONiC during the run of PTF SONiC tests. The analysis will be performed after running each PTF test.
PTF tests are running on a host machine connected to the DUT switch, and perform functional testing and validation of the SONiC . As part of test outcome validation, we also need to validate all the logs produced by SONiC system to check for any possible errors reported.
We will introduce new test component, LogAnalyzer, written in Python language, which will be performing log analysis.
There will be an Ansible script orchestrating whole test setup and run. The script will setup DUT, and PTF host. In respect to log analysis it'll perform following actions:
- Deploy PTF test on test host
- Deploy LogAnalyzer and all needed files on switch DUT.
- Run LogAnalyzer to add start-marker
- Start PTF test
- After PFT test finishes, collect the test report
- Run LogAnalyzer to add end-marker
- Run LogAnalyzer to perform analysis
- Collect the report output and provide test summary
The Ansible script will deploy LogAnalyzer as well as all configuration files needed by it to run.
SONiC is a collection of several executables, logging into various locations. Currently logging is done in
- /var/log/syslog
- /var/log/bgpd.log
- /var/log/zebra.log
LogAnalyzer will be parametrized to accept a list of log files it'll need to analyze, to allow any set of files to be specified in general.
To avoid log rotation happening while tests are running, Ansible script will perform log rotation at initialization.
LogAnalyzer will perform analysis of logs produced by running SONiC system. LogAnalyzer will generate output file with analysis summary, which will be consumed by the Ansible script.
Output produced by LogAnalyzer will be consumed by the Ansible script. LogAnalyzer will analyze portion of the logs during which PTF test has ran.
To analyze only logs produced during test run, the Ansible script will invoke LogAnalyzer before starting test activity, so LogAnalyzer would know the starting point of the analysis. On invocation LogAnalyzer will write a unique start-marker into the log files. This will define starting point of log analysis. All logs following after start-marker will be considered for analysis.
When PTF test finishes, Ansible script will invoke LogAnalyzer to put unique end-marker into the log file. This defines ending point of log analysis. Any messages following end-marker will not be considered during log analysis.
Start Marker format: "start-LogAnalyzer-runID"
End marker format: "end-LogAnalyzer-runID"
'runID' - is a string passed in by the caller which identifies the current run. Ansible script will provide the runID.
The runID will be concatenation of test name and time-date (with resolution including seconds) stamp to uniquely identify test and running date.
When instructed to perform analysis, LogAnalyzer will iterate over specified log files and
- find start-marker
- perform analysis
- stop analysis once discovered end-marker.
As part of input LogAnalyzer will be provided with file representing the set of types of messages to look for and detect as errors or warnings. The file will contain a set of regular expressions, log.analysis.match.set.txt. The file will be placed on the DUT as part of setup and path provided to LogAnalyzer at startup.
For each PTF test there will be provided its own version of the file with content specific for the test. The file will be authored by the test developer.
The messages will contain following keywords: ERROR, ERR, WARN, WARNING, panic.
When LogAnalyzer scans the log files it will match log content against the marching set. All matching messages will be collected in a separate file, log.analysis.result.log.
SONiC consists of various components which perform logging. Some of the components will BGP related logs, some other components log about learned IP addresses. During runtime, any of the SONiC components can asynchronously output an error or warning message, which is not related to the current test being ran. For example, while running trap tests, there can be error messages about learned IP addresses. Such messages should be ignored, otherwise we'll face many false positives.
To address this issue we will provide data file to LogAnalyzer where all non-relevant messages will be listed, as regular expressions - log.analysis.ignore.set.txt.
Another example of such messages are platform specific messages.
For each PTF test there will be its private version of the file with the ignore content. This file will be created along with the PTF test development.
For each log file LogAnalyzer will analyzer all messages between start-marker and end-marker in following way:
- Compare message against expressions in log.analysis.ignore.set.txt.
- if message matches - skip it.
- otherwise, compare against set of expressions in log.analysis.match.set.txt
- if messages matches collect message into log.analysis.result.log
When LogAnalyzer finishes it will return number of detected messages matching regex expressions. value 0 will mean no concerning messages detected.
LogAnalyzer will produce a log.analysis.summary.log with summary of found issues. The summary will be collected by the Ansible script and presented to user after test runs.
The format will be:
LOG ANALYSIS SUMMARY:
MESSAGES DETECTED: NUMBER
If LogAnalyzer detects errors in logs, it will need to also collect a DUMP file representing full content of SAI and underlying SDK.
In event of identifying errors LogAnalyzer will return non-0 value to the invoking Ansible script, which will in turn create backup of log files for post-mortem analysis.
Log analyzer will run in 2 modes: to initialize analysis, and to perform analysis.
Ansible script will invoke LogAnalyzer and instruct to insert start-marker into all log files.
The parameters specified for this phase will be:
- action="init"
- log_file_list=path,{path}
- start_marker="string"
After PTF test runs and finishes, the Ansible script will invoke LogAnalyzer to perform analysis of the logs.
- action='analyze'
- 'analysis_src=path' - Path to
- log.analysis.match.set.txt
- log.analysis.ignore.set.txt
- Destination path where following files will be placed:
- 'analysis_destination=path'
- log.analysis.result.log
- log.analysis.summary.log
- Backed up log file(s).
- 'analysis_destination=path'
- log_file_list=path,{path} 1.List of log files to analyze.
- start_marker="string"
- end_marker="string"
PTF tests will not directly interact with LogAnalyzer, because this function will be performed by Ansible script.
Ansible script will provide LogAnalyzer with input all input parameters. The Ansible script will create ssh session to DUT through which it will:
- invoke LogAnalyzer, before starting PTF test - to put start-marker into log file.
- invoke LogAnalyzer at test end to put end-marker
- invoke LogAnalyzer to perform analysis and create output file. LogAnalyser return code will indicate whether any errors were found.
- copy all output files produced by LogAnalyzer from DUT.
If LogAnalyzer returns non-0 value
- the script will back up all test files and logs from DUT to a directory which will have same name as runID (see definition of the start marker).
- the content of log.analysis.summary.log will be printed to the console, following output of PTF test.
loganalyzer input parameters:
--help Print usage
--verbose Print verbose output during the run
--action init|analyze - action to perform.
init - initialize analysis by placing start-marker
to all log files specified in --logs parameter.
analyze - perform log analysis of files specified in --logs parameter.
--out_dir path Directory path where to place output files,
must be present when --action == analyze
--logs path{,path} List of full paths to log files to be analyzed.
Implicetly system log file will be also processed
--run_id string String passed to loganalyzer, uniquely identifying
analysis session. Used to construct start/end markers.
--match_files_in path{,path} List of paths to files containing strings. A string from log file
matching any string from match_files_in will be collected and
reported. Must be present when action == analyze.
By default syslog will be always analyzed and should be passed by match_files_in.
--ignore_files_in path{,path} List of paths to files containing string.
A string from log file matching any string from these
files will be ignored during analysis. Must be present
when action == analyze.
python /tmp/loganalyzer.py --action init --run_id fib.2016-12-01.22-16-24 This command will write a start marker in syslog: Dec 2 00:10:14 str-msn2700-04 INFO start-LogAnalyzer-fib.2016-12-02.00-10-09
python /tmp/loganalyzer.py --action analyze --run_id fib.2016-12-01.22-16-24 --out_dir /tmp/test_results --match_files_in test_match_messages.txt,common_match.txt
This command will write end-marker in syslog and perform analysis. Dec 2 00:10:18 str-msn2700-04 INFO end-LogAnalyzer-fib.2016-12-02.00-10-09
Strings between the start and end marker will be analyzed, see below for produced output.
s, "removeNeighbor: Neighbor is still referenced"
r, ".ERR", ".WARN", "crash"
r, "kernel:.*Oops", "kernel:.*hung", "kernel.*oom\s"
r, "kernel:.*scheduling", "kernel:.*atomic", "kernel:.*panic"
r, "kernel:.*\serr", "kernel:.*allocation", "kernel:.*kill",
r, "kernel:.kmemleak.","kernel:.* Err:"
s, "ERR"
Result file:
-----------Matches found in file:/var/log/syslog-----------
Dec 2 00:10:15 str-msn2700-04 ERR orchagent: :- removeNeighbor: Neighbor is still referenced ip:10.0.0.31
Dec 2 00:10:16 str-msn2700-04 ERR orchagent: :- removeNeighbor: Neighbor is still referenced ip:10.0.0.31
Dec 2 00:10:17 str-msn2700-04 ERR orchagent: :- removeNeighbor: Neighbor is still referenced ip:10.0.0.31
Dec 2 00:10:18 str-msn2700-04 ERR orchagent: :- removeNeighbor: Neighbor is still referenced ip:10.0.0.31
Matches:4 Total matches:4
Summary file:
LOG ANALYSIS SUMMARY
FILE: /var/log/syslog MATCHES 4
TOTAL MATCHES: 4
LogAnalyzer is not intended to measure Scale/Performance, but as a result of log analysis it may report performance related error conditions appeared during a test run.
LogAnalyzer should not cause noticeable degradation of performance of the DUT during analysis.
TODO
DUT will be provided with configuration files for LogAnalyzer to operate on. The files are specified in the section 'Analysis of log messages'. In summary those are:
LogAnalyzer will analyze messages from SONiC, not targeting SAI. SAI and messages will not be analyzed.
Please refer to 'Analysis of log messages' section above.
PTF host needs to be connected to a port through which it will send packets to the switch, and needs to have connection via ports through which the switch will send forward received packet back to the host for validation.
TODO
Test cases
LogAnalyzer is a test tool, which will be invoked by a running PTF test. The test cases will be defined in Test plan document for specific PTF tests, not in this document.
-
For Users
-
For Developers
-
Subgroups/Working Groups
-
Presentations
-
Join Us