Scancode Output Formats

Scan results generated by Scancode are available in different formats, to be specified by the following options.

Output Options

-f, --format <format>
 

Set <output_file> format to one of:

  • csv
  • html,
  • html-app,
  • json,
  • json-pp,
  • jsonlines,
  • spdx-rdf,
  • spdx-tv

or use <format> as the path to a custom template file.

By default, if nothing is specified, Output format is json.

Warning

In later versions, i.e. 3.x, this format changes significantly. Instead of this format, i.e. ./scancode --format html, a more concise format ./scancode --html is used.

csv format

Scancode can publish results in the useful .csv format.

The following code performs a scan on the samples directory, and publishes the results in csv format.

./scancode -lpceiu -f csv samples sample.csv

The first line of the csv file contains the headings, and they are:

  • Resource,
  • type,
  • name,
  • base_name,
  • extension,
  • date,
  • size,
  • sha1,
  • md5,
  • files_count,
  • mime_type,
  • file_type,
  • programming_language,
  • is_binary,
  • is_text,
  • is_archive,
  • is_media,
  • is_source,
  • is_script,
  • scan_errors,
  • license__key,
  • license__score,
  • license__short_name,
  • license__category,
  • license__owner,
  • license__homepage_url,
  • license__text_url,
  • license__reference_url,
  • license__spdx_license_key,
  • license__spdx_url,
  • matched_rule__identifier,
  • matched_rule__license_choice,
  • matched_rule__licenses,
  • copyright,
  • copyright_holder,
  • author,
  • email,
  • start_line,
  • end_line,
  • url,
  • package__type,
  • package__name,
  • package__version,
  • package__primary_language,
  • package__summary,
  • package__description,
  • package__size,
  • package__release_date,
  • package__homepage_url,
  • package__notes,
  • package__bug_tracking_url,
  • package__vcs_repository,
  • package__copyright_top_level

Each subsequent line represents one element, i.e. can be any of the follwoing:

  • license
  • copyright
  • package
  • email
  • url

So if there’s multiple elements in a file, they are each given an entry with the details mentioned earlier.

../../_images/output_csv.png

html format

Scancode supports formatting the Output result is a simple HTML format, which helps quick visualization of the detected licence/copyright and other main information in the form of tables.

The following code performs a scan on the samples directory, and publishes the results in csv format.

./scancode -lpceiu -f html samples sample.html

The HTML page generated has these following Tables:

  • Copyright and Licenses Information
  • File Information
  • Package Information
  • Licenses (Links to Dejacode/License Homepage)
../../_images/output_html1.png ../../_images/output_html2.png ../../_images/output_html3.png

html-app Format

Scancode also supports formatting the output in a HTML visualization tool, which is more helpful than the standard HTML format.

The Files scanned are shown in the left sidebar, and the section on the right contains seperate tabs for the following:

  • License Summary
  • Copyright Summary
  • Clues
  • File Details
  • Packages

Note

The HTML app also contains a Search option to easily find what you are looking for.

Warning

The html-app feature has been deprecated and you should use Scancode Workbench instead to visualize scan results. The official repo.

../../_images/output_html_app1.png ../../_images/output_html_app2.png ../../_images/output_html_app3.png

json Format

Scancode by default outputs scan results in JSON format.

The entire JSON file is structured in the following manner:

At first some general information on the scan, what options were used, nnumber of files etc. And then all the files follow.

{
  "scancode_notice": "Generated with ScanCode and provided on an \"AS IS\" BASIS, WITHOUT WARRANTIES\nOR CONDITIONS OF ANY KIND, either express or implied. No content created from\nScanCode should be considered or used as legal advice. Consult an Attorney\nfor any legal advice.\nScanCode is a free software code scanning tool from nexB Inc. and others.\nVisit https://github.com/nexB/scancode-toolkit/ for support and download.",
  "scancode_version": "2.2.1",
  "scancode_options": {
    "--copyright": true,
    "--package": true,
    "--info": true,
    "--license-score": 10,
    "--license-text": true,
    "--format": "json-pp"
  },
  "files_count": 43,
  "files": [
    {
      "file_path_1": "samples/JGroups/licenses/apache-1.1.txt",
      "file_type_1": "file",
    },
    {
      "file_path_1": "samples/JGroups/licenses/apache-1.2.txt",
      "type_type_2": "file",
    },
}

Note

The default json format prints the whole report without linebreaks/spaces/indentations, which can be ugly to look at.

json-pp Format

json-pp stands for JSON Pretty-Print format. In the previous format, i.e. simple json, the whole output is printed in one line, which isn’t well suited for getting information if you’re looking at the file itself (or printing at stdout). So this option formats the output results in json but in a properly spaced and indented manner, and is easy to look at.

Here’s a sample JSON output for one file

{
  "path": "samples/JGroups/licenses/apache-1.1.txt",
  "type": "file",
  "name": "apache-1.1.txt",
  "base_name": "apache-1.1",
  "extension": ".txt",
  "date": "2019-09-18",
  "size": 2937,
  "sha1": "186d9195787fcbf2e5401b966159395640e06d11",
  "md5": "8c909d7735f28f4fdb0128ee57fb430e",
  "files_count": null,
  "mime_type": "text/plain",
  "file_type": "ASCII text, with CRLF line terminators",
  "programming_language": null,
  "copyrights": [
    {
      "statements": [
        "Copyright (c) 2000 The Apache Software Foundation."
      ],
      "holders": [
        "The Apache Software Foundation."
      ],
      "authors": [],
      "start_line": 4,
      "end_line": 5
    }
  ],
  "packages": []
},

jsonlines Format

Scancode also has a jsonlines format option, where each report of a file scanned is formatted in one line. Here is a sample line from a report generated by the jsonlines format:

{"files":[{"path":"samples/zlib/ada",licenses":[],"copyrights":[],"packages":[]}]}

Note

This jsonlines format also omits other file information like type, name, date, extension, sha1 and md5 hashes, programming language etc.

Comparing Different json Output Formats

Default json Output:

../../_images/output_json.png

json-pp Output:

../../_images/output_jsonpp.png

jsonlines Output:

../../_images/output_jsonlines.png

spdx-rdf Format

SPDX stands for “Software Package and Data Exchange” and is an open standard for communicating software bill of material information (including components, licenses, copyrights, and security referances).

Learn more about SPDX specifications here and in this GitHub repository.

Here the file is structured as a dictionary of named properties and classes using W3C’s RDF Technology.

../../_images/output_spdx_rdf1.png

spdx-tv Format

This format is another SPDX variant, with the output file being structured in the following manner:

It starts with:

# Document Information

SPDXVersion: SPDX-2.1
DataLicense: CC0-1.0
DocumentComment: <text>Generated with ScanCode and provided on an "AS IS" BASIS, WITHOUT WARRANTIES
OR CONDITIONS OF ANY KIND, either express or implied. No content created from
ScanCode should be considered or used as legal advice. Consult an Attorney
for any legal advice.
ScanCode is a free software code scanning tool from nexB Inc. and others.
Visit https://github.com/nexB/scancode-toolkit/ for support and download.</text>


# Creation Info

Creator: Tool: ScanCode 2.2.1
Created: 2019-09-22T21:55:04Z

After a section titled #Packages, a list follows.

../../_images/output_spdx_tv_package.png

Each File information is listed under a #File title, for each of the files.

  • FileName
  • FileChecksum
  • LicenseConcluded
  • LicenseInfoInFile
  • FileCopyrightText

An example goes as follows:

../../_images/output_spdx_tv_file.png

After the files section, there’s a section for licences under a #Licences title, with the following information for each licence:

  • LicenseID
  • LicenseComment
  • ExtractedText

Here’s an example:

../../_images/output_spdx_tv_licenses.png

Custom Output Format

While the three built-in output formats are convenient for a verity of use-cases, one may wish to create their own output template which can be passed to the --format argument. Scancode makes this very easy, as it uses the popular Jinja2 template engine. Simply pass the path to the custom template to the --format argument, or drop it in a folder to src/scancode/templates directory.

For example, if I wanted a simple CLI output I would create a template2.html (file name and extension does not matter) with the particular data I wish to see. In this case, I am only interested in the license and copyright data for this particular scan.

## template2.html:
[
    {% if results.license_copyright %}
        {% for location, data in results.license_copyright.items() %}
            {% for row in data %}
  location:"{{ location }}",
  {% if row.what == 'copyright' %}copyright:"{{ row.value|escape }}",{% endif %}
             {% endfor %}
         {% endfor %}
    {% endif %}
]

Now I can run scancode using my newly created template:

$ ./scancode -f template2.html -c samples/ t.json
Scanning files...
  [####################################]  46
Scanning done.

Now are results are saved in t.json and we can easily view them with head t.json:

[
  location:"samples/JGroups/LICENSE",
  copyright:"Copyright (c) 1991, 1999 Free Software Foundation, Inc.",

  location:"samples/JGroups/LICENSE",
  copyright:"copyrighted by the Free Software Foundation",
]