How to Run a Scan

In this simple tutorial example, we perform a basic scan on the samples directory distributed by default with Scancode.

Warning

This tutorial uses the 2.2.1 version of Scancode Toolkit. If you are using a newer version of Scancode Toolkit, check respective versions of this documentation.

Warning

This tutorial is for Linux based systems presently. Additional Help for Windows/MacOS will be added.

Setting up a Virtual Environment

Scancode Toolkit 2.2.1 and Workbench 2.4.1 is not compatible with python 3.x so we will create a virtual environment using the Virtualenv tool with a python 2.7 interpreter.

The following commands set up and activate the Virtual Environment venv-scan2.2.1:

virtualenv -p /usr/bin/python2.7 venv-scan2.2.1
source venv-scan2.2.1/bin/activate

Setting up Scancode Toolkit

Get the Scancode Toolkit Version 2.2.1 tarball or .zip archive from the Toolkit GitHub Release Page under assets options. Download and extract the Archive from command line:

For .zip archive:

unzip scancode-toolkit-2.2.1.zip

For .tar.bz2 archive:

tar -xvf scancode-toolkit-2.2.1.tar.bz2

Or Right Click and select “Extract Here”.

Check whether the Prerequisites are installed. Open a terminal in the extracted directory and run:

./scancode --help

This will configure ScanCode and display the command line Help text.

Looking into Files

As mentioned previously, we are going to perform the scan on the samples directory distributed by default with Scancode Toolkit. Here’s the directory structure and respective files:

../../_images/files_sample.png

We notice here that the sample files contain a package zlib.tar.gz. So we have to extract the archive before running the scan, to also scan the files inside this package.

Performing Extraction

To extract the packages inside samples directory:

./extractcode samples

This extracts the zlib.tar.gz package:

../../_images/extractcode.png

Note

--shallow option can be used to recursively extract packages.

Deciding Scan Options

These are some common scan options you should consider using before you start the actual scan, according to your requirements.

  1. The Basic Scan options, i.e. -c, -l, -p, -e, -u, and -i are to be decided, according to your requirements. If you do not need one specific type of information (say, licences), consider removing it, because more things you scan for, longer it will take for the scan to complete.
  2. --license-score INTEGER is to be set if licence matching accuracy is desired (Default is 0, and increasing this means a more accurate match). Also using --license-text includes the matched text to the result.
  3. -n INTEGER option can be used to speed up the scan using multiple parallel processes.
  4. --timeout FLOAT option can be used to skip a file taking a lot of time to scan.
  5. --ignore <pattern> can be used to skip certain group of files.
  6. -f <output_format> is also a very important decision when you want to use the output for specific tasks/have requirements. Here we are using json as scancode workbench imports json files only.

For the complete list of options, refer All Available Options.

Running The Scan

Now, run the scan with the options decided:

./scancode -f json-pp -clpeui -n 2 --ignore "*.java" --timeout 20 samples sample.json

A Progress report is shown:

Scanning files for: infos, licenses, copyrights, packages, emails, urls with 2 process(es)...
Building license detection index...Done.
Scanning files...
[####################] 41
Scanning done.
Scan statistics: 41 files scanned in 38s.
Scan options:    infos, licenses, copyrights, packages, emails, urls with 2 process(es).
Scanning speed:  1.23 files per sec.
Scanning time:   33s.
Indexing time:   5s.
Saving results.