Toyplot

1.0 Introduction

This project was completed for the University of Victoria's Documenting and Understanding Software Systems course. It is designed to act as an in-depth analysis of an open-source software system to gain a deeper understanding of its purpose, implementation and underlying architecture. The system chosen for this chapter is Toyplot, a multi-functional plotting libaray developed by Sandia National Laboratories. It is written in Python for the purpose of giving scientists a toolset that can produce data graphics that are interactive and easily shareable. This is achieved by allowing plots to be exported in native web formats, such as HTML and SVG. Toyplot can be incorporated with third-party applications, such as Jupyter Notebook, for the user's convenience while generating plots.

This chapter will detail the results of the analysis that was carried out over the course of the Spring 2018 semester through exploring the existing system documentation, the system's GitHub repository and Toyplot's functionality. This includes project stakeholders, business goals, quality attributes, architecturally significant requirements, a module view of the system, a component and connector view of the system, and an examination of the quality of Toyplot's code.

Project

Sandia National Laboratories' GitHub Repository for Toyplot

Team

2.0 Identifying Project Stakeholders and Business Goals

In this milestone we examined the documentation [1] provided by Toyplot's developers to determine the primary stakeholders of the project. Additionally, the documentation yielded the business goals the developers wish to meet by producing and improving on this codebase.

2.1 Stakeholder Table

Stakeholder

Description

Project Owners

Developers

Employees of SNL. (Timothy Shead was the lead developer).

Maintainers

There are sporadic commits from the public but generally Shead is the sole maintainer.

Support Staff

Timothy Shead is the primary point of contact for developer support as his contact information is indicated on the support page of the Toyplot website.

Data Graphic Users

Developers that use Toyplot to generate and export graph with their own data.

Viewers

Read and interact with the graph generated.

2.2 Business Goals

Business Goals

Description

Support web standards

Toyplot embraces HTML, SVG and Javascript so that graphics are always native and embeddable.

Progressive functions

Interactive data cursors, hyperlink, and exportation of data through CSV.

Aesthetics in data graphics

Embrace standard design principles and best practices in data visualization community.

Reliable services

Ensure the accuracy of output and the durability of the program.

Support multiple publishing formats

Ensure that those who wish to use static formats, such as PDF or PNG, for publishing are also able to use Toyplot.

Community based product improvement

User feedback and suggestions are used to improve the functions already provided by Toyplot and to add new ones. This helps both the quality of the product and the reputation of the project.

Shareable and mobile

Graphics are always completely self-contained and embeddable without need for a server.

Efficient animation

High quality of data graphics without compression artifacts.

Map plot functionality

Plot data with latitude/longitude geospatial information.

Touch screen function

Interact with graphic data plots on touch screen devices.

Up-to-date documentation

Up-to-date tutorials and example visualizations within documentation.

3.0 Architecturally Significant Requirements and Utility Tree

In this milestone, we analyzed both the documentation [2] and the code to determine the architecturally significant requirements (ASRs) for Toyplot. These are organized by the quality attributes that they contribute to. Following this is a utility tree consisting of a subset of the ASRs and three detailed quality attribute scenarios.

3.1 ASRs

  • Maintainability:

    • The system encourages continuous improvement by addressing bug reports.

  • Usability:

    • The system has an intuitive API to handle plotting needs (for simple figures with a single coordinate system, and a single mark) with minimal written code required.

    • The system supports interoperability via exporting the graphic along with its raw data in CSV format, so users have more control over it.

    • The system outputs graphics with intuitive styling, and uses best practices for clarity and aesthetics.

    • The system is adaptive to developers and is usuable in either Python 2 or 3.

    • The system is seemlessly integrated with Jupyter.

  • Availability:

    • The system outputs graphics in native web standards to ensure support for all platforms and use cases.

    • The system captures and handles the failures by generating bug reports.

    • The system and it's dependencies can be obtained from various sources such as GitHub, Pip, Anaconda.

  • Testability:

    • The system guarantees the accuracy of the graph based on the input.

    • The system has wide test coverage (95%+).

    • The system incorporates a continuous integration server, Travis CI, to perform a series of regression tests to ensure that the system doesn't backslide into an undesirable state after changes have been made.

  • Performance:

    • The system renders graphs with low latency.

  • Modifiability:

    • System supports the addition of new features at minimal cost of time.

3.2 Utility tree

3.3 Scenarios

Aspect

Details

Scenario Name

Native Web Standards

Business Goals

Wide user base and ease of use

Quality Attributes

Usability, Availability

Stimulus

Execute plot generation

Stimulus Source

Developers

Response

System generates a plot a plot in SVG, PNG, HTML, MP4 etc

Response Measure

User feedback and satisfaction, ability to display graphics among different platforms

Aspect

Details

Scenario Name

Unique Plot Aesthetics

Business Goals

Embracing standard design principles and best practices

Quality Attributes

Usability

Stimulus

Specify style, colors of the desired plot, and override the default style information when creating the plot

Stimulus Source

End user

Response

System produces the specified style, and colors of plot

Response Measure

The direct correlation with industry standard CSS styling (ease of use) and output of the desired aesthetic

Aspect

Details

Scenario Name

Low Latency Plot Generation

Business Goals

Superior speed and effeciency in comparison to other similar tools

Quality Attributes

Performance

Stimulus

Send a request to generate a plot

Stimulus Source

User invokes rendering module

Response

Autorendering, display or one of multiple backends for export

Response Measure

< 0.05s rendering capability for native autorender independent of browser output system

4.0 Module View

This milestone documents the module structure of Toyplot. Specifically, the views presented below pertain to two of the Quality Attribute Scenarios discussed in Milestone 2: Native Web Standards and Low Latency Plot Generation. The primary focus is to detail how the structure implemented by Toyplot facilitates the quality attributes associated with these scenarios.

4.1 Primary Presentation

We chose to present our primary view with a module "uses" diagram, which embodies what modules use what other modules and therefore their dependencies. This structure assists in delivering a clearer picture of how Toyplot's functionality is designed with regards to the above QAS.

4.2 Element Catalogue

The following catalogue of element introduces the elements and relations that are relevant to the above primary view uses module diagram. Descriptions of the included class modules aim to explain the important architectural elements in the Toyplot system with regards to the aforementioned QAS.

Numpy

Numpy Array

Numerical data arranged in an array-like structure in Python. With the NumPy import, the user is able to load in data using built-in functions for creating arrays from scratch.

This numpy.linspace() example will create arrays with a specified number of elements, and spaced equally between the specified beginning and end values:

x = numpy.linspace(0, 10)
y = x ** 2

Toyplot

canvas

Implements the :class:toyplot.canvas.Canvas class, which defines the space that is available for creating plots. Every Toyplot figure begins with a :class:toyplot.canvas.Canvas - a drawing area upon which the caller adds marks. Coordinate systems are used to map data values into canvas coordinates. This space is created after importing canvas from toyplot.canvas.

cartesian

Used for 2D data mapping on the canvas space. Can be implicitly created when using convenience API, or explicitly using “toyplot.canvas.Canvas.cartesian()”. By default, it is sized to fit in the entire canvas.

Example of creating cartesian explicitly: canvas = toyplot.Canvas(width=600, height=300) axes1 = canvas.cartesian(bounds=(30, 270, 30, 270))

Example of creating cartesian implicitly: toyplot.plot(y, width=300);

projection

Classes and functions for projecting coordinates between spaces.

style

Functionality for working with Cascading Style Sheets (CSS) style information. Every mark added to a figure will have at least one (and possibly more than one) set of styles that control its appearance.

coordinates

Classes and functions for working with coordinate systems. The coordinate system ranges (the area on the canvas that they occupy) is specified when they are created. Use :class:toyplot.coordinates.Cartesian for two-dimensional data and for one-dimensional data use :class:toyplot.coordinates.Numberline. Also, coordinates chooses plot type and assign them to mark.py

mark

Provides data objects (marks) that are displayed on a :class:toyplot.canvas.Canvas. Marks are added to coordinate systems using the factory functions they provide.

layout

Provides layout algorithms. By default, cartesian and table coordinates are sized to fill the entire canvas.

marker

Functionality for managing markers (shapes used to highlight datums in plots and text). By default, plot and scatterplot markers are small circles, but many other shapes can be specified.

data

Classes and functions for working with raw data. Uses numpy to create data to use for figures.

broadcast

The functions in this module are used by Toyplot to handle the conversion from constant, per-series, and per-datum values into their canonical 1D and 2D array representations.

units

Functionality for performing unit conversions. There are several places in Toyplot where you will need to specify quantities with real-world units, including canvas dimensions, font sizes, and target dimensions for document-oriented backends.

color

Functionality for managing colors, palettes, and color maps. Colors in Toyplot are represented as red-green-blue-alpha (RGBA) tuples, where each component can range from zero (off) to one (full strength). Palettes group together related collections of color values and color maps are used to perform the real work of mapping data values to colors.

mp4

Generate MPEG-4 videos.

reportlab

Support functions for rendering (PDF documents & PNG images) using ReportLab.

Rendering backends

At the lowest level, Toyplot provides a large collection of rendering backends. Each backend knows how to render a Toyplot canvas to a specific file format, and can typically render the canvas directly to disk, to a buffer that is provided, or return the raw representation of the canvas for further processing. To generate the desired file format, use one of the following four backends explicity to render the canvas:

pdf

Functions to render PDF documents.

png

Functions to render PNG images.

svg

Functions to render SVG images.

html

Functions to render the canonical HTML representation of a Toyplot figure.

Rendering display

Toyplot provides an alternative to rendering figures via backends in the form of display modules, which provide convenient ways to display figures interactively.

browser

Functionality for displaying a Toyplot canvas in a web browser. This is the most portable display module and will open a new browser window containing your figure, with all of Toyplot's interaction and features intact.

4.3 Context Diagram

The following diagram outlines the context for the previous primary view, in particular, showing the external elements used by the Toyplot system. The purpose of this diagram is to provide better understanding of what external elements are used in the functionality of Toyplot.

4.4 Behavior Diagram

The following diagram demonstrates how Jupyter Notebook, a Python development tool Toyplot is optimized for, interacts with the code to produce a simple plot and then render it as self-contained HTML file. By "simple plot" we mean a plot that makes use of the default values for the aesthetic components of Toyplot, such as line colour. The behavior diagram is in the form of a procedural sequence diagram; thus, it models the exchanges of stimuli over time between the objects that are part of the aforementioned process. These stimuli take the form of functions and method calls invoked in Jupyter Notebook or by the files depicted on the diagram, along with the results produced by the computations completed by these calls. This view was chosen as it provides the best look at how easily the system can facilitate a user (i.e. usability) who wishes to plot and render a graph of their desired data in a web-ready format through a 3rd party application, such as Jupyter. It also indicates the responsiveness of the Toyplot system to stimuli within it and from external sources (i.e. performance), as to create these plots a long sequence of actions is completed in a time period that is often less than a second (depending on the amount of data being plotted).

4.5 Rationale

Toyplot's system design and organization is an intuitive file structure and well thought out. There is a clear separation of concerns and the components appear to be flexible and modular as each responsibility in Toyplot is managed by it's own "module" or file.

Identified in our first QAS, one of Toyplot's main use incentives is the ability to export visuals to multiple different web standards or platforms. Due to this, Toyplot's various data types and platforms have been split up into their own class files. For instance, the various important rendering mechanisms each have a separate class file: mp4.py, png.py, html.py, svg.py or pdf.py. This ensures that maintenance of the system is as easy as possible and future extensions for added functionality can be added seamlessly. Thus, to support a new export format, developers need only add a new class file and implement the required rendering methods, without requiring too much internal debugging (often related to over-complicated dependencies) of Toyplot’s other components other than those that will directly use or be used by the new rendering backend.

As was mentioned in the Behavior Diagram section above, Toyplot is designed to be used with 3rd-party applications that the user is comfortable developing code in. In terms of manipulating data, Toyplot allows users to input their data coordinates via hard-coding it into their implementation or by reading in lines from CSV files, as can be seen in our Context Diagram. This fosters a user-friendly workflow which increases the overall usability of the system. Additionally, Toyplot acts as a clean, minimalist interface for users who are otherwise unfamiliar with the basics of Python's plot generating libraries to produce beautiful plots of their data sets without having to delve into the functionality beyond a cursory understanding of it. The convenience of being able to generate interactive plots as projected images or export them as various file types belonging to current web-standards, such as HTML, or more traditional, such as PDF, offers users flexibility to meet any kind of document requirements.

Low latency plot generation is also well considered in the system’s component design. There is a clear data-flow from a user’s input data, to the final output data graphic. Additionally, Toyplot’s module design ensures that plots are generated with as little latency as possible due to the fact that there is small number of dependencies that exist between modules so that fewer need to be loaded upon Canvas initialization or rendering. Whether for immediate visualization or for rendering to export, the modular design of each function allows for quick results and higher performance.

5.0 Component and Connector View

This milestone documents the components and connectors (C&C) structure of Toyplot. Specifically, the views presented below pertain to the Quality Attribute Scenario discussed in Milestone 2: Low Latency Plot Generation. The following C&C views will illustrate how the Toyplot's modules are desgined to interact with eachother at runtime, during the execution of generating a plot figure, in such a way to provide users with a high quality of performance.

5.1 Primary Presentation

The following primary presentation diagram represents the principle runtime elements of generating a typical Toyplot figure in a component and connector view. This view communicates the dynamic elements and interactions that are used during this runtime activity. Following the

Toyplot tutorial

, one can trace through this C&C view to see the end-to-end data flow processes from inputting data to rendering a figure within a Jupyter notebook. This streamlined design of the Toyplot engine processes result in visualization of the desired plot in Jupyter within an indiscernible amount of time to the user.

5.2 Element Catalogue

The following catalogue of elements introduces the elements and relations that are relevant to the above primary presentation C&C view diagram. Descriptions of the included modules and their interactions aim to explain the important architectural modules that deal with processes at runtime in the Toyplot system and help analyze system throughput with regards to the aforementioned performance QAS.

np.array(...)

Numerical data arranged in an array-like structure in Python. With the NumPy import, the user is able to load in data using built-in functions for creating arrays from scratch.

This numpy.linspace() example will create arrays with a specified number of elements, and spaced equally between the specified beginning and end values:

x = numpy.linspace(0, 10)
y = x ** 2

canvas

This component is the top-level container for Toyplot drawings and implements the :class:toyplot.canvas.Canvas. The canvas object defines the space that is available for creating plots

The following is an example of how to explicitly create a canvas object:

canvas = toyplot.Canvas(width=300, height=300)

coordinates

Coordinate systems are used to map 2D data values into canvas coordinates for drawing on the canvas space. The coordinate system ranges (the area on the canvas that they occupy) is specified when they are created. The :class:toyplot.canvas.Canvas.cartesian() method returns a standard two-dimensional Cartesian coordinate system :class:toyplot.coordinates.Cartesian object. This axes object contains sets of nested properties, including interactive ones, which adjust the behavior of the figure.

Following the above example, add a set of Cartesian axes to the canvas:

axes = canvas.cartesian()

mark

Marks are data objects that are added to coordinate systems. Use factory methods such as toyplot.plot() or toyplot.scatterplot() for display on a canvas. :class:toyplot.mark.Mark objects are an abstract interface for Toyplot and carry no explicit visual representation of their own - it is up to the coordinate system and rendering backend to determine how to render the data.

In this example, the plot function adds a plot mark using the supplied coordinates:

mark = axes.plot(x, y)

plot

Convenience function for creating a line plot on the canvas in a single call (see above mark example). It returns a new canvas object (:class:toyplot.canvas.Canvas), a new set of 2D axes that fill the canvas (:class:toyplot.coordinates.Cartesian) and the new plot mark (:class:toyplot.mark.Plot). Customizing the style of the plot marks can be done by using the markers parameter or adjusting the CSS style parameter (the default colors of the marks are drawn from :class:toyplot.color.Palette()).

autorender

Enable / disable canvas autorendering - which is enabled by default with :data:toyplot.config.autorender when a canvas is created - controls how the canvas should be displayed automatically without caller intervention in certain interactive environments, such as Jupyter notebooks.

display html

Importing the Toyplot library and creating a plot using the above code examples within a Jupyter (IPython) notebook will Canvas._autorender the plot into the notebook using the interactive HTML format backend (IPython.display.display_html), no other code statements required. Toyplot figures are interactive - users can mouse over the figure to see interactive coordinates and even extract the data from a figure in CSV format using a context menu.

5.3 Interfaces

5.3.1 Toyplot Convenience API

1. Interface Identity: The Toyplot Module is a top-level convenience API for creating data visualizations in one line of Python code either with autorendering within an interactive browser notebook (like IPython's Jupyter notebook) or by directly calling the functions and outputting with one of Toyplot's rendering backends or display modules.

2. Resources Provided: There are 8 factory functions corresponding to 8 types of data visualizations: bars, fill, graph, image, matrix, plot, scatterplot, and table. Toyplot.plot() will be taken as an example. Refer to the docs for a complete description of the other similar functions and their respective parameters.

Toyplot.plot(a, b=None, along='x', area=None, aspect=None, color=None, filename=None, height=None, label=None,  
margin=50, marker=None, mfill=None, mlstyle=None, mopacity=1.0, mstyle=None, mtitle=None, opacity=1.0, padding=10,  
show=True, size=None, stroke_width=2.0, style=None, title=None, width=None, xlabel=None, xmax=None, xmin=None,  
xscale='linear', xshow=True, ylabel=None, ymax=None, ymin=None, yscale='linear', yshow=True)

Pre-Conditions:

  • required:

    • Importing toyplot libraries and external python libraries

    • a, b (array-like sets of coordinates) – If a and b are provided, they specify the first and second coordinates respectively of each point in the plot. If only a is provided, it provides second coordinates, and the first coordinates will range from [0, N).

    • Axes and canvas: either predefined or can add to the function call: eg. Toyplot.coordinates.Cartesian.plot()

  • optional: all other parameters are either not required or have default values. Refer to the docs for a complete description.

Post-Conditions:

  • Returns the new plot mark (toyplot.mark.Plot)

3. Data Types:

  • (toyplot.mark.Plot) is an instantiation of the abstract interface (toyplot.mark.Mark) and contains all the mark information to plot on a canvas: coordinate_axes, table, coordinates, series, stroke, stroke_width, stroke_opacity, stroke_title, marker, msize, mfill, mstroke, mopacity, mtitle, style, mstyle, mlstyle, and filename.

4. Error Handling:

If any of the coordinate data is not an int or float, the data.py module will pass and the function will not execute:

try:
    result[name] = result[name].astype("int")
except:
    try:
        result[name] = result[name].astype("float")
    except:
          pass

5. Variability: N/A

6. Quality Attributes:

  • Usability: The user can generate a plot with a single compact statement.

  • Performance: The operation shall return the result of a plotting request within 0.05s.

7. Rationale:

This convenience API allows the user to create many plots using a single compact statement. It leverages the numpy library and the Toyplot modules to apply default values and minimalist design choices so the user only needs to enter very few configuration parameters.

8. Usage Guide:

x = numpy.linspace(0, 10)
y = x ** 2
canvas = toyplot.Canvas(width=300, height=300)
axes = canvas.cartesian()
mark = axes.plot(x, y)

This will generate a plot using lines.

5.3.2 Toyplot Marks

1. Interface Identity: Marks are the data object that can be passed to rendering/display system to generate the result. The specifications for rendering are based on the backends which are not in the Marks itself. Under object Marks, there are 11 subclasses defined for different rendering tasks. Each subclass takes required inputs filtered by the internal module toyplot.require.

2. Resources Provided:

  • Resources syntax: Toyplot.mark.scatterplot is selected to be the study case for this interface. Scatterplot is a convenience function that takes required arguments to generate a scatterplot mark.

class toyplot.mark.Scatterplot(coordinate_axes, coordinates, filename, marker, mfill, mhyperlink, 
mlstyle, mopacity, msize, mstroke, mstyle, mtitle, table)
  • Resources semantics: The arguments passed to scatterplot will be validated by toyplot.require and then assigned under each instance. The following block of code shows one of the assignment examples.

self._coordinates = toyplot.require.table_keys(table, coordinates, modulus=D)
N = len(self._coordinates) / D
  • Returned value: Toyplot.mark.scatterplotreturns two values:

    domain(axis): Contains the minimum and maximum values for the given axis.

    Markers: Contains a list of toyplot.marker.Marker objects as ordered markers to be used in plotting.

4. Error Handling:

  • Unexpected value passing: toyplot.require will generate an exception if the providing data does not match the requirement.

def instance(value, types):
    if not isinstance(value, types):
        raise ValueError("Expected %s, received %s." % (types, type(value))) 
    return value

5. Variability: N/A

6. Quality Attributes:

  • Usability: User receives the scatterplot after inputting the minimal required data.

7. Rationale: Object Marks contains a number of rendering options as its subclasses. It converts the input value to the corresponding markers for further plotting.

8. Usage Guide:

x = numpy.linspace(0, 2 * numpy.pi)
y1 = numpy.sin(x)
y2 = numpy.cos(x)
canvas = toyplot.Canvas(width=500, height=300)
axes = canvas.cartesian()
mark = axes.scatterplot(x, y1)

This will generate a plot using points.

5.4 Context Diagram

The following diagram outlines the context for the primary view in order to provide a better understanding of the external elements used by the Toyplot system. This is a combination of a pipe and filter diagram with a data flow diagram, which was a design decision made to highlight the different interactions Toyplot has with the various types of external components.

5.5 Variability Guide

The Toyplot system C&C primary presentation view shown above may change depending on what input data the user requires and the desired output format. The user can choose to import CSV data using a data import function within the :class:toyplot.data module (classes and functions for working with raw data) rather than entering data manually with Python statements. Alternatively to data input with use of numpy.ndarray, Toyplot's data class can also import data using pandas.DataFrame. Following plot generation, the secondary option to autorendering the plot within a Jupyter notebook is to actively export the figure using any one of Toyplot's rendering backends which would require use of their respective class (mp4, svg, html, browser) and in the case of pdf or png images, additionally the toyplot.reportlab class module.

5.6 Behaviour Diagram

The diagram below provides a high-level view of the system at runtime execution. This diagram is in the form of an activity diagram; it shows the activity flow between different actions. Also, it demonstrates efficiency by showing how Toyplot systems interact with one another, at what stage errors are being handled, and what files execute to run every action. First, the user writes statements to enter data to create a canvas space, cartesian coordinates, and a plot, respectively. The operation will run through multiple if statements in require.py to check for errors for every entry the user makes. If it passes, the entered data will go through multiple stages until a plot is produced. The entire process of producing a plot takes a fraction of a second.

5.7 Rationale

As stated in the Introduction and the selected Quality Attribute Scenario, one of the strongest cases for using Toyplot is the speed with which a user can generate a plot of their desired data. One of Toyplot's main benefits above its competitors is the use of linear data processing and few dependencies; thus, we chose a Pipe and Filter style diagram for our primary presentation to depict this. As can be seen in that diagram, Toyplot's main functionality, starting from creating the canvas to overlaying the marks using the provided data, is not dependent on any external services, with the exception of 3rd party Python libraries. This enables the runtime processing to only be limited by the user's machine and can provide fast rendering with sufficient hardware.

Our context diagram outlines Toyplot's interactions with external elements during the creating of a graph from inputted data. To expand on what is shown in the diagram, the process of producing a plot is a fairly streamlined one from the user's perspective, with the initial contribution of data being made directly to the system. As for external dependencies required by Toyplot's functionality, the use of numpy for arrays as data input over the use of standard Python data structures allows for optimized computation on 2-dimensions. Further, Toyplot minimizes internal and external dependencies which decrease the overhead of loading and initializing a program.

As detailed in the Interfaces section and supported by our primary presentation, the developers of Toyplot designed the system to achieve its business goals relating to performance and usability by implementing convenient and streamlined interfaces, as well as standard data inputs. To create a Toyplot figure and display it in a Jupyter notebook, only three lines of Python code are required; however, it also enables to user to create more complex visuals if they wish.

Furthermore, the user does not need to be responsible for any of the internal data structures or implementation details. The behaviour diagram exemplifies this separation of actions required on the part of the user and system. The Plot method handles the data transformation from numpy arrays to canvas Marks and maintains rendering specific details internally. All that is truly needed from the user is the data and parameters regarding how the resulting graph should be customized to meet their needs. Minimal input on the part of the user yields a more user-friendly experience that achieves results quickly.

Graphs in toyplot contain three main parts which are canvas, coordinate systems, and marks. Canvas provides the background for plotting; coordinate systems determine the axis; marks contains the body information of the plot. With convenient API, toyplot allows users to create the three elements with minimal configuration. This enables fast prototyping and portability of the system.

6.0 Code Quality and Technical Debt

This milestone presents the results of using a few different types of automated code quality tools on Toyplot's codebase, specifically: Codescene, SonarQube, and Understand. The reports generated by these tools enabled us to determine sources of technical debt the developers may have incurred through the development process.

6.1 Code Quality Report

6.1.1 Codescene Analysis

Codescene Cloud analyzes a github repository and reports on several important aspects including size, coupling, code churn, change frequency and author information across commits. Among the helpful visualizations is a package size diagram that immediately reveals 3 very large packages which are possible targets for refactoring due to their enourmous size:

When looking more closely at these packages they suffer from extremely high coupling and code churn, especially coordinates has 3074 lines of code, 268% code churn, and is coupled with 549 other classes. The package html has 2536 lines of code, 304% code churn and coupled with 568 other classes. Canvas, mark and color are the next most highly coupled packages. Suggested refactoring may involve decoupling the largest of the classes and functions to create more specialization and less coupling which would promote better maintainability.

In terms of function size, the _render function is especially problematic with 2056 lines of code and a change frequency score that is 5 times the next most problematic. Refactoring should reduce the overlap issues where there are proximally similar functionality in _render both within the __init__ package and the render package.

6.1.2 SonarQube Analysis

SonarQube reports on a series of metrics to provide analysis of source code using common technical debt measures. First, issues within the lines of code are highlighted and then seven measures of the system are derived to give insight on any potential operational risks. The metrics provided for its analysis are maintainability, reliability, security, coverage, duplications, size and complexity. Overall, the SonarCloud report generated for Toyplot indicates that the system is in an "OK" state and thus passed SonarCube's Quality Gate (the other values possible are "error" or "warn").

Overview

SonarQube categorizes issues found within the source code as either Bugs, Vulnerabilities or Code Smells with varying degree of severity based on different language specific rules. Reported issues found in the code are seen as having an impact on the maintainability of the system and consequent need for code refactoring results in technical debt. The system has four days and seven hours of technical debt, this is an estimation of the total amount of time that would be requried to refactor the source code based on the below issues and seven metrics.

0 Bugs

These issues are reported as code that are "demonstrably wrong" and could cause system failure. Toyplot's source code contained no bugs of any degree of severity. However, this measurement is not accurate not only for the sole reason that codebases are never 100% bug-free, but moreso because there are outstanding issues in Github which are labeled with the "bug" category.

0 Vulnerabilities

These issues mark potential weaknesses to hackers. There were no vulnerabilities of any degree of severity found in the source code.

148 Code Smells

Neither a bug nor a vulnerability, issues that could lead to future problems down the line. The source code contained the following issues: six Blocker operational or security risks, 33 Critical operational or security risks, 83 Major impacts on productivity and 26 Minor impacts on productivity. Each is categorized with different degree of severity and likelihood of the "Worst Thing" happening: high impact & high likelihood, high impact & low likelihood, low impact & high likelihood, and low impact & low likelihood respectively. Most categorized code smells were repitions of the same type of issue, for example the 33 Critical non-compliant sections of code had to do with functions that had too high Cognitive Complexity (see below Complexity metric) which are difficult to maintain and the 6 Blocker non-compliant lines of code used an outdated print statement instead of the current built-in Python function.

Maintainability Rating: A

SonarQube gives the system a maintainability rating of A, which is to say that the outstanding remediation cost is less than or equal to 5% of the time that has already gone into the application. The scale is based on the effort estimated to fix all the previously mentioned maintainability issues categorized as code smells. Toyplot is estimated to require four days and seven hours of code refactoring and has a Debt Ratio of 0.5% (the cost to fix the software over the cost to develop the software).

Reliability Rating: A

Similar to the Maintainability rating, the Reliability rating of A is given to the system because zero bugs were found. As such, no remediation effort is required to fix any bug issues, which is the measure of this metric.

Security Rating: A

Similar to the two previous ratings, the system recieved a Security rating of A based on the number of vulnerabilites found and their degrees of severity. No vulnerabilites results in no remediation effort is deemed to be required.

Duplications

This is a measure of the total number of lines(617)/files(9)/blocks(28) of code considered to be duplicated (can be long-term risks). The system has 2.6% density of duplication, which is a ratio of duplicated lines over total lines.

Size

LOC (lines of code) is a typical measure of the size of a system used by most technical debt analysis tools. Toyplot has a total of 24,163 lines, 16,978 of which are lines of actual code and 18.8% are comment lines. There are nine directories, 128 files, 104 classes, 1,295 functions, and 9,240 statements. However, it is important to note that much of the "codebase" is not actually lines of code. There are multiple .csv files in the codebase and multiple thousands of lines in color.py for example are just declaring tuples with RGB values of color.

0% Coverage

This measure is based on how much of the source code has been covered by the system unit tests. No lines of code were found to be covered during unit tests execution, though this is not an accurate measure as the

/tests

folder contains over 50 tests. See the

pylint and nose outputs

that were used to run Toyplot’s regression tests for comparison.

Complexity

This is metric determines how simple or complicated the control flow of the system is. Cyclomatic Complexity is another common measure of the minimum number of test cases required for full test coverage - Toyplot has 2,607. SonarQube calculates this number based on the number of paths through the code; whenever the control splits, the complexity counter is incremented by one (each function starts with a complexity of one). The files with the highest complexity are coordinates.py (399) and html.py (292), which coincides with CodeScene's analysis. Cognitive Complexity is a measure of how difficult the application is to understand.

6.1.3 Understand Analysis

Another supplemental tool we used is Understand. Although Understand mainly provides project structure overview rather than in-depth code quality analysis, it has some functions that may contribute to our understanding to this project.

CodeCheck function follows several universal standards in the industry and takes the target files through a code violation check. The results by checks are mostly positive except for 20 violation instances and two recommendations: Files too long, Overly complex functions. The following treemap provides an intuitive demonstration of the violation density.

6.2 Technical Debt

Around the codebase, there are various notes that the developer has left themself to come and fix later. Issues such as adding logging to certain methods, hardcoded strings, or non-generic abstractions. However, these cases are obviously not overarching architecture or design flaws. It is more important to examine the various ways in which the developers have accumulated long-term technical debt.

  1. One of our original business goals was to focus on fast rendering and there are improvements in this area that can be made. In issue #171 Timothy describes that currently, when exporting a graph as HTML and rendering a DOM, Toyplot writes every CSS-Property that is necessary for the entire graph to render. In constrast to modern browsers which keep track of the current state of the DOM and only write out CSS properties that have changed, thus requiring less computation. Toyplot should adopt this feature and write out only properties of the graph that have changed to increase the rendering speed.

  2. When rendering out to a HTML file, Toyplot supports showing the current coordinates of the mouse cursor on the page. The code to calculate and generate all of this is located in the html module. Unfortunately not only is the Javascript code in the HTML files monolithic, but much of this functionality is duplicated in the projection module. When development occured corners were cut to not have to abstract the logic of calculating coordinates to a central location. Now there is duplicated functionality across the codebase. There should be a way to merge the Javascript code with projections/locators and to make it easier to create new projections and locators - having a base class with requisite logic implemented.

7.0 Conclusion

Documentation is important for developing, maintaining, or understanding a system; it reduces the cost of: 1. Future changes: by demonstrating important details of the code architecture as well as the system design pattern. Without documentation, the project might be at the mercy of the developer. 2. Maintenance by depicting the interaction between different classes for a specific use case scenario. 3. Transferring knowledge by presenting the nature of a system and how it works. This is helpful for new developers to get on track faster, and for high level understanding of the system which is required not only for developers and engineers, but also for the stakeholders.

While documenting Toyplot, we obtained an overall knowledge of the whole project starting from high-level details (i.e. business goals, QA, ASR etc.) down to low-level details (i.e. code, server environment etc.). Since Toyplot has no bugs or vulnerabilities, according to SonarQube analysis, Toyplot’s system is at a very good standing in terms of reliability and security. Although the total cost of refactoring Toyplot is less than 5% of the total work that has already been put into it, it is still a must due to the extremely high level of coupling between different components in the system. Toyplot is an open-source project where everyone can contribute under certain rules put by the developers. Having a project open for other developers to edit and add further improvements requires an outstanding level of loose coupling as this will support maintainability in the future.

8.0 References

[1] Sandia National Laboratories, "The Toyplot Ethos", Toyplot Documentation, 2014 [Online]. Available: https://toyplot.readthedocs.io/en/stable/ethos.html. [Accessed: 17-Jan-2018]

[2] Sandia National Laboratories, "Release Notes", Toyplot Documentation, 2018 [Online]. Available: https://toyplot.readthedocs.io/en/stable/release-notes.html. [Accessed: 25-Jan-208]

[3] Sandia National Laboratories, "Quality Attribute Generic Scenarios", Article, [Online]. Available: http://sa.inceptum.eu/sites/sa.inceptum.eu/files/Content/Quality Attribute Generic Scenarios-2.pdf

[4] L. Bass, P. Clements 1955 and R. Kazman, Software Architecture in Practice. (Third ed.) Upper Saddle River, NJ: Addison-Wesley, 2013.

[5] P. Clements 1955, Documenting Software Architectures: Views and Beyond. Boston: Addison-Wesley, 2003.

Last updated