Tensorboard

TensorBoard

List of Figures

Figure 2.1: The utility tree for TensorBoard

Figure 4.1: Primary presentation showing a modular view of the TensorBoard project

Figure 4.2: Context view showing the interaction of external entities and their usage dependencies

Figure 4.3: Sequence diagram of TensorBoard outlining that the event data was preprocessed before being displayed to the user

Figure 5.1: TensorBoard latency QAS as seen through a pipe and filter view

Figure 5.2. View into the behaviour of TensorBoard Elements

Figure 5.3: TensorBoard within it runtime environment

Figure 6.1: Report from local SonarQube Server for all projects

Figure 6.2: Report from SonarCloud for all projects

Figure 6.3: Report from Local SonarQube Server for TensorBoard

Figure 6.4: Hotspots report from CodeScene identifying the modules with most development activity

Figure 6.5: Refactoring Target report from CodeScene

Figure 6.7: Function Complexity Trend from CodeScene

Figure 6.8: Code Age Report from CodeScene

Figure 6.9:Function comparison to identify the internal temporal coupling

Figure 7.1: High Duplications before refactoring

Figure 7.2: High Duplications in files before refactoring

Figure 7.3: Reduce Duplication after refactoring

1.0 Introduction

TensorBoard is a visualization suite which has been created to compliment TensorFlow. Like TensorFlow, TensorBoard is open source and released under the Apache 2.0 license by Google. As a result, the main contributors of TensorBoard are Google employees themselves, however, since it is open source, anyone can contribute to the project. TensorBoard provides insights into almost all neural network processes, structures, and results to enable neural network research and development.

Commonly, the creation, propagation, and administration of individual nodes and weights has been considered as transpiring within a black box. Being able to analyse a neural network with the granularity of which type each node is, what the connections are between nodes, and what their associated weights are has never been so easily achieved as it now is with TensorBoard.

TensorBoard provides a suite of visualization tools to make it easier to understand, debug, and optimize TensorFlow programs. These tools allow for the graphing, plotting of quantitative metrics, and the visualization of additional data such as images or audio. TensorBoard allows for these representations through its highly modularized development. TensorBoard has many different tools available to handle different types of data such as images or audio, while it also has plugins to represent the same data generated by the TensorFlow program in multiple ways such as in Scalar or Histograms.

TensorBoard’s modular development approach contains related files within a single directory. This allows for easy modifiability and maintainability. Since the tools provided by TensorBoard, which are referred to as plugins, are contained within their own directory and generally have only a handful of relations to files outside of their directory, the addition or removal of a tool is made to be a straightforward process for any stakeholders to understand or implement.

2.0 Identifying Stakeholders and Business Goals

The identification of stakeholders and business goals provides an understanding of those who are impacted by this project, the level of which they are impacted and the general direction of the project as derived from the business goals. By knowing what the business goals are for a project of an organization, an architecture can be identified which will best serve the organization and its future endeavours. Further, once an architecture is identified, the stakeholders can be efficiently orchestrated in order for the goals to be readily realized.

2.1 Stakeholders

1. Contributing to the growth and continuity of the organization

TensorBoard was developed to compliment the Tensorflow library. TensorBoard allows for the visualization of computational graphs, represent metrics about the execution of graphs and show other data that passes through it such as images. As a software compliment for TensorFlow, the continuity of the TensorBoard organization depends entirely on the continuity of the TensorFlow Library. This is echoed in the DEVELOPMENT.md file wherein it says, "TensorBoard at HEAD relies on the nightly installation of TensorFlow…" [1]. As such, it becomes apparent that with the continuity and growth of TensorFlow, the growth and continuity of TensorBoard will be mirrored. By supporting visualization of TensorFlow models, TensorBoard encourages adoption by industry practitioners and academics alike. The visual aids that TensorBoard offers are helpful when constructing complex neural networks as they can include intricate looping, layers and mappings between input and output. Together the TensorBoard and TensorFlow toolkits also feedback into Google’s overall artificial intelligence and machine learning goals by facilitating research into those areas through high quality tools.

2. Meeting financial objectives

TensorBoard is released under the Apache License 2.0 [2]. As a result the user is able to use the software for any purpose, distribute, and modify the software without concerns of royalties. In other words, it is completely free to use, and sell modified versions of this software. So direct monetization of this software is not possible. That being said, these tools are used by developers inside Google, which is itself an intelligent adware company, among many other things, and so these tools directly support the primary source of revenue for Google by aiding adware development efforts. The better that Google’s algorithms can mapl advertisements to customers through its machine learning algorithms, the more that it can sell those advertisements for.

3. Meeting personal objectives

Many companies participate in Open Source Software projects to increase their brand awareness. It is likely that both TensorFlow and TensorBoard were developed primarily internally, and then released to the public to both promote innovative research in the field that Google could then utilize, and also to improve their own tools through free developer hours on the project from the community. It could also be that the developers working in Google are philanthropic individuals who care about giving back to the industries that those individuals have made their careers within.

4. Meeting responsibility to employees

In creating TensorBoard, Google would certainly have a business goal of involving the team that created TensorFlow to facilitate clear APIs between systems, meeting functional requirements, quality attributes, and ensuring that the development burden of TensorBoard is shared by the right employees and done fairly. This responsibility to its employees is important for the project to be a long term and well supported success. Without this the project would likely not succeed in its other goals.

5. Meeting responsibility to society

TensorBoard helps developers make higher quality AI, and this in turn improves the quality of systems that the common populace are exposed to. AI tools in the world have had massive impacts on the way that we live our lives, and so TensorBoard is helping that aspect of technology improve.

6. Meeting responsibility to state

As machine learning tools become more ubiquitous but also more mission critical, government regulation will likely increase in response to the increased risk this implies. One of the critical challenges in certifying and regulating software is mapping requirements to implementation. TensorBoard will help developers either directly show regulators how a requirement is being met in a deployed software system through visualization of elements of the requirement-implementing system, or it may facilitate developers understandings of where those requirements are being met in the system, thus enabling them to demonstrate in other ways that a system meets its requirements.

7. Meeting responsibility to stakeholders

By making TensorBoard open source, and thus fully the responsibility of its users, Google has wiped its hands of any liability, or legal responsibility to systems produced with the aid of TensorBoard. This is a beneficial strategy for Google because tools like TensorBoard have the potential to be misused, and/or cause damages both in industrial and public settings.

8. Managing marketing position

Google is widely considered to be a world leader in artificial intelligence/machine learning. By positioning itself as the provider of an open source platform useful in machine learning development, Google has ensured its market share of tools used within machine learning will be substantial, and supported its brand image as being one of openness and philanthropy within the industry [3]. TensorBoard may also feed back into Google through the invention of new tools, concepts, and intellectual property. But mostly it is that TensorBoard is a free addition to TensorFlow, which is itself free to be used, that ensures that Google continues to be prominent in the machine learning and artificial intelligence fields.

9. Improving business processes

Because TensorBoard aids developers in understanding their machine learning tools, it is possible that machine learning tools that are optimizing for cost reduction in one of Google’s key revenue generators, such as advertising, would improve because of TensorBoard. Improvements to key revenue generators would absolutely yield business process improvements. This is also true of other key aspects of Google business processes such as customer support and product refinement. Separately, because TensorBoard is an open source project, improvements to its source code are easily facilitated by its core end users who can submit pull requests directly to the project on Github. This serves as a direct customer service line through which end-users can enhance their usage of TensorBoard, and Google developers can support increased utility of TensorBoard for its users.

10. Managing the quality and reputation of the products

Customer expectation is an integral component of TensorBoard. TensorBoard has been a part of TensorFlow since it was open sourced by Google in 2015. Google has been considered one of the top reputable companies in terms of corporate responsibility, and by extension their products, inclusive of their TensorBoard API and any other products. To manage the quality and reputation of the products Google needs to maintain their branding, releasing of products (and any subsequent recalls), the types of potential users, the quality of existing and new products, and testing their support and related strategies.

11. Managing change in environmental factors

The business context for TensorBoard is subject to change as any software system. For these changes, we take into consideration what might change in the business goals of TensorBoard. Environmental factors that may alter current business goals include: social, legal, competitive, customer and technological environments. An example of an environmental factor could include an alteration in the customer environment, as competing systems in the visualization of computational graphs, the needs and desires of the customer may alter depending on the information they are attempting to visualize, and their need for a system representative of metrics in a visual context.

2.2 Business Goals

Role

Concerns

Role Instances

Contextual Concerns Description

Acquirers

Oversee the procurement of the system or product

TensorFlow developing team, End users (Both from companies and non-profit individuals or groups)

Since TensorFlow is a complex tool for machine learning and neural network computation, a visual aid is required to analyze the performance and procedures of the TensorFlow models. TensorBoard is a significant feature to TensorFlow. Neural networks are often treated as black boxes where their developers are left to ask what kind of structure they contain, how neural networks are trained may be hard to figure out. TensorBoard makes it possible to visualize the complex neural network training process, to better understand, debug and optimize the program [4]. It provides inspectable and understandable graphs of TensorFlow 's running progress. In summary, TensorBoard is the perfect addition for TensorFlow to make it more effective and understandable for developers and users. As free software, many non-profit individuals and groups such as students, universities, and technophiles may adopt the combination of TensorFlow and TensorBoard. Many companies have applied TensorFlow as their numerical computation solutions, such as Airbnb, Nvidia, Uber, etc5. Though there is no evidence to prove that these companies will also use TensorBoard as an analysis tool, the bundle installation makes TensorBoard an inseparable part of TensorFlow. So these companies could be considered as potential TensorBoard users and acquirers.

Assessors

Oversee the system’s conformance to standards and legal regulation

TensorBoard project owner such as development team or supervisor, Open-source communities like Apache

First, as an open source project, the TensorBoard core developing team or supervisor has the responsibility to keep the project conformant. They review pull requests from the community and ensure those changes contribute to the project in desirable ways. The project should conform with open source licenses. TensorBoard uses Apache License 2.0 as its copyright license [1]. As a GitHub project, it also must meet GitHub privacy statements.

Communicators

Explain the system to other stakeholders via its documentation and training materials

TensorBoard documentation maintainers

As an open source project, anyone with the required permissions could be the communicator on this project. In terms of research on the history of official documentation on GitHub, we find there are several publishers during the history of TensorBoard. "TensorFlower-gardener", built the first version of "Readme" markdown file to introduce the TensorBoard project. After his leaving, several committers, such as "dandelionmane", "wchargin", "jert" and "nfelt", continually contribute to this documentation to ensure an approachable introduction and training material for users and developers are easily accessible. There is an official website to introduce how to use TensorBoard in detail. There is no author signature on the webpage, but considering this is an official webpage, it should be published by the TensorBoard developing team.

Developers

Construct and deploy the system/product from specifications (or lead the teams that do this)

TensorBoard developing teams and other contributors on GitHub

The main developers are researchers and engineers working on the Google Brain team who built the original version of TensorBoard/TensorFlow [6]. External developers contribute code and bug-fixes on GitHub. Through the insights on GitHub, we can see there are a total of 106 contributors who made contributions to this project, and the top four of them are "dandelionmane", "dsmilkov", "jart" and "TensorFlower-gardener", who contributed around 90% of the development workload [7]. Except for "TensorFlower-gardener" who has emptied their GitHub space, the other three people are all employees in the Google. As we mentioned in the Assessors section, due to the complexity of TensorFlow, a visual tool could make developers and users understand and adjust TensorFlow procedures more easily. This internal requirement naturally makes TensorBoard a necessary and inseparable part of TensorFlow.

Maintainers

Manage the evolution of the system/product once it is operational

Users or analysts of TensorFlow

TensorBoard is a visual tool based on TensorFlow. It often is used while TensorFlow is running. So we suppose the maintainers of TensorBoard would be the users themselves. Probably, there are professional analysts or testers who would run TensorBoard for evaluation. They all could be maintainers who manage TensorBoard. As maintainers, they have the responsibility to manage the evolution and life cycle of TensorBoard. In contrast to the running result, they are mostly concern the running status.

Production Engineers

Design, deploy, and manage the hardware and software environments in which the system will be built, tested, and run

Users or analysts of TensorFlow

Like maintainers, production engineers should also be the users of TensorFlow, no matter if they are a company or individuals. TensorBoard often is deployed on a personal computer, such as a laptop or desktop. These individuals are usually the major operating personnel to design and deploy the local environment. Certainly, in some companies, the resources and configuration management staff would be responsible for this. They provide proper infrastructure including hardware, OS and matched software to make the TensorBoard run correctly.

Suppliers

Build and/or supply the hardware, software, or infrastructure on which the system will run

Users such as individual or companyPackages provider used by TensorBoard

Basically, the suppliers are the users themselves, no matter if they are individuals or companies. The TensorBoard's running environment should be provided and set up by the user. The user is also responsible for maintaining the running status of TensorBoard. Meanwhile, the providers of packages that TensorBoard runs on or is related to could also be generalized suppliers, such as Python, Docker or VirtualEnv, TensorFlow, etc. However, open source providers, such as Python, do not care too much about TensorBoard because they do not make a profit from the package.

Support Staff

Provide support to users for the product or system when it is running

Technical support team / GitHub users

Since TensorBoard is an open source project, users can submit their issues via GitHub, and both the technical support team as well as other GitHub users can answer their questions. The GitHub users can even pull their requests and help the TensorBoard team to fix the bugs. The quality of the project directly affects the work of support staff, for example, as the quality of the product increases, it may release the workload of the support stuff to some extent. The TensorBoard website includes a support section in the footer with all community centric issue resolution, including links to Stack Overflow, their social media and github issue tracker. These avenues allow for quick interactions between users and support staff.

System Administrators

Run the system once it has been deployed

Individual(s) / teams who make the TensorBoard available to use

System administrators encompass a wide variety of duties. For example, they would review all the pull requests. Typically they deal with installation, user administration, maintenance, general support and documentation. They care about whether TensorBoard can normally be used which is their responsibility.

Testers

Test the system to ensure that it is suitable for use

Test team of TensorBoard / developers

A possible set of Testers could include those familiar with TensorBoard, and it's functionalities, like a subset of the TensorBoard team (possibly outside of development). Depending on which functionality they would like to test they could use a variety of novice, intermediate and experienced developers to test various areas. This could include extensions of functionality improvements to labelling and signifiers. Their responsibilities are in testing some modules/ functions/ systems of TensorBoard and give feedback that whether the development/ updates meet the requirements. They care about the quality of codes which are submitted by the developers because this quality directly effects testers' work. The testers also care about their interpreted feedback because they can influence the quality of projects and future project directions.

Users

Define the system’s functionality and ultimately make use of it

Individual / group / companies (airbnb, nvidia etc.)

There is a wide range of users for TensorBoard. They include Individuals, Groups, and entire companies. Individuals may be hobbyists who wish to learn more about neural networks, Students studying AI courses, or Researchers who want to apply the power of neural networks to their field of study. Companies may and do use TensorBoard to help them develop products to be commercialized. Although the TensorBoard website does not list companies that use TensorBoard, there are companies listed on the TensorFlow website. Each of these companies (Airbnb, Nvidia, etc...) 5 is likely to be or have been a user of TensorBoard. These users care about the functionality of TensorBoard such as, whether TensorBoard can accurately and efficiently visualize TensorFlow graphs, and if the API provided allows them to be effective in their coding.

3.0 Architecturally Significant Requirements and Utility Tree

Architecturally significant requirements (ASRs) are fundamental portions of a software architecture that have a significant or measurable impact on the software quality and cost [8].

3.1 List of Architecturally Significant Requirements

  • When the TensorBoard server is running, browsers that navigate to the correct localhost URL will see TensorBoard interface loaded with TensorFlow data if it is available.

  • The TensorBoard website, which contains all of the documentation for TensorBoard’s use, must be available 24/7.

  • TensorBoard must be able to consume TensorFlow protobuf data.

  • TensorBoard must render in a webpage.

  • TensorBoard must provide interactions that can be supported in a browser.

  • TensorBoard must be built in a way that open source contributors can understand.

  • TensorBoard must compartmentalize related elements.

  • TensorBoard must be efficient in terms of interaction and data loading latency.

  • TensorBoard must acceptably return visual metrics of the TensorFlow data.

  • TensorBoard must not require computational intensity exceeding the capability of the host machine to complete tasks.

  • TensorBoard must accept periodic, stochastic, and sporadic event arrival patterns from all input sources.

  • TensorBoard processing time should be acceptable to end users.

  • TensorBoard must enable continuous integration and testing.

  • TensorBoard must allow for the control of each component’s internal state and input while allowing the output to be observed.

  • TensorBoard must be testable by a multitude of developers and users.

  • TensorBoard must provide console feedback to users while running in the background.

  • TensorBoard must never fail silently.

  • TensorBoard must support all manner of neural network architectures.

  • TensorBoard must offer the option to clean up complex or unstructured graphs.

  • TensorBoard must enable summaries of TensorFlow model runs.

  • TensorBoard must offer hyperparameter searching.

  • TensorBoard must render a variety of embeddings of resulting classifications.

  • TensorBoard must handle a variety of input data types (audio, video, images, text etc).

  • TensorBoard must run within the hardware constraints of modern laptops.

  • TensorBoard must be able to evaluate a subset of data from TensorFlow.

  • TensorBoard must support comparison of multiple TensorFlow executions.

  • TensorBoard must be able to show a text dashboard.

  • TensorBoard must minimize the impact of an error.

  • TensorBoard must distinguish between CPU and GPU implementations.

  • TensorBoard must be fault tolerant.

  • TensorBoard must be a platform capable of allowing novice users to learn, experiment and provide unexpected inputs.

  • TensorBoard must run within a plurality of web browsers.

  • TensorBoard source code must be available and accepting commits from developers globally during all scheduled up-time.

  • TensorBoard oversight must be efficient in observation and confirmation of the validity of changes made.

  • TensorBoard oversight must have means to accept bug reports from users.

  • TensorBoard must generate bug reports.

  • Maintenance of TensorBoard must be independent of TensorFlow, i.e it could be installed and maintained separately.

  • TensorBoard must be able to be rapid deployed.

  • TensorBoard must be an independent application, and should not impact the running of Tensorflow.

3.2 Prioritized List of Scenarios

ASR

ASR

QAS (with response information)

Technical Complexity

Business Complexity

Order of Priority

Performance

TensorBoard must be efficient in terms of latency

A user loads data form TensorFlow, the summary of data is shown less than 0.05 seconds.

H

H

1

Usability

TensorBoard must handle a variety of input data types (audio, video, images, text etc)

TensorBoard can completely analyze and visualize the result data from multimedia data training

H

H

2

Performance

processing time should be acceptable

A user trains against 55000 MINST items and the display time is less than 3 seconds

H

H

3

Reliability

TensorBoard must be fault tolerant.

A fault occurs and the system is able to catch and handle the exception, while continuing on with the processing of data. A log is generated and functionality continues.

H

H

4

Testability

TensorBoard must be testable by a multitude of developers, users, etc...

Under heavy load (more than 1000 parallel users), update a simple entity in persistent storage in less than 2 seconds in 95% of cases

M

L

5

Interoperability

TensorBoard must provide interactions that can be supported in a browser

Users can utilize TensorBoard on their home/work machines, the webpage should give response less than 0.05 seconds and any webpage provided by TensorBoard can be access by user.

M

H

6

Usability

TensorBoard must render a variety of embeddings of resulting classifications

When a user loads 55000 MINST items to Tensorflow for training and use embedding APIs, the TensorBoard embedding projector shows the visual classification less than 1 minutes.

M

H

7

Usability

Support comparison of multiple execution

A user runs multiple instances of an incrementally modified Tensorflow algorithm. The user must be able to overlay each data set on a single graph. These graphs should include histogram, execution time, etc...

M

H

8

Testability

TensorBoard must enable continuous integration

TensorBoard team is able to summary and test the new requirement everyday.

H

H

9

Usability

TensorBoard must support all manner of network architectures

Different users run TensorBoard based on different networks which are multilayer perceptron, convolutional neural network, recursive neural network and recurrent neural network etc., and all of them can be successfully visualized.

H

H

10

Usability

TensorBoard must enable summaries of TensorFlow model runs

After a user loads the data with 55000 from TensorFlow, the summary of the data can be shown in 3 secondes with a proper setting.

H

H

11

Modifiability

Changing a TensorFlow model cannot affect TensorBoard

After a user update the version of TensorFlow, every model of TensorBoard created by the user is not changed.

L

M

12

Usability

TensorBoard must offer the option to clean up complex or unstructured graphs

If visualization created under normal conditions, users can change visualization type, and data types in under 2 seconds once selected.

L

M

13

Interoperability

TensorBoard must be able to consume TensorFlow protobuf data

A user has a TensorFlow model which contains .proto file, TensorBoard can provide API to analyze the data in 1 seconds.

H

M

14

3.3 Quality Attribute Scenario Templates

Scenario Priority

1

Scenario Name

The fastest Neural Network visualization

Business Goals

Offer a rapid visualization of Neural Network model run data.

Quality Attributes

Performance

Stimulus

A user has run a TensorFlow model on data, has output results in the correct way, and would like to see the results in the TensorBoard UI immediately.

Stimulus Source

User

Response

TensorBoard prioritizes renderer performance through efficient data processing and visualization rendering modules and technologies.

Response Measure

A user loads data form TensorFlow, the summary of data is shown less than 0.05 seconds.

Scenario Priority

2

Scenario Name

Ensuring all TensorBoard data types are supported

Business Goals

Enable TensorBoard to handle all of the data types that TensorFlow can handle

Quality Attributes

Usability

Stimulus

User

Stimulus Source

User brings a variety of Neural Networks into TensorFlow, any of which might operate against video, audio, text, photo, or other media types.

Response

TensorBoard provides modules that facilitate visualization of all the input dataset data types.

Response Measure

TensorBoard handles any standard medium of data, and lets users visualize and interact with it.

Scenario Priority

4

Scenario Name

Fault tolerance and handling

Business Goals

Ensure that despite faults or errors TensorBoard continues to provide a good user experience, and that when fatal errors occur good logging happens to help customers debug the problem

Quality Attributes

Reliability

Stimulus

User

Stimulus Source

User feeds data into TensorBoard that is damaged or misconfigured, or user interacts with TensorBoard in breaking ways.

Response

Ensure TensorBoard contains default error logging modules, UI elements that provide useful error feedback, and contains graceful failure procedures that don't result in fatal errors. If fatal errors occur, indicate this and direct user to log files.

Response Measure

When non-fatal errors occur, TensorBoard provides feedback for 99% of data and user interaction cases. When fatal errors occur, users can see a full history of the application running in logs directly from the error window that appears.

3.4 Utility Tree

Figure 2.1: The utility tree for TensorBoard

4.0 Module View

A module view shows the elements, relations, constraints and usages of the modules that comprise the project. A module view is advantageous to have as it gives a broad overview of major relations between individual modules. This allows for a greater level of maintainability as it allows a new developer to quickly become accustomed to the the architecture and derive what impacts will be had be making changes.

4.1 Primary Presentation

Figure 4.1: Primary presentation showing a modular view of the TensorBoard project

The main QAS described previously is the usability that TensorBoard can completely analyze and visualize the result data from multimedia data training of TensorFlow. After TensorFlow save different types of data to the Data log, Event Processing module in TensorBoard process this data. Then the data can be used by Plugins module and shown in the Internet Browser.

The second QAS related to the primary presentation is the maintainability that TensorBoard can be accessed and run without the influence of TensorFlow. As shown in the diagram, the module of TensorFlow and TensorBoard are separated, so the change of TensorFlow will not influence TensorBoard. The only interaction is that TensorBoard uses the data from the Data Log module which is generated by TensorFlow. If TensorFlow sends new data to Data Log module, user can load the new data if they want to visualize the new dataset.

4.2 Element Catalog

  • TensorBoard: A suite of web application to visualize TensorFlow runs and graphs. The main functionalities are described in the Plugins module

  • TensorFlow: TensorFlow is a tool for machine learning, mainly designed for deep neural network models. This module sends data that users want to visualize in TensorBoard to the data log module,

  • Data/Event Log: Receives data from TensorFlow and saves the data in the log file.

  • Backend: Backend contains two partitions, application and event-processing

    • Application: This module constructs TensorBoard as a WSGI application, handles serving static assets, and implements TensorBoard data APIs. It read the data from event-processing, uses the the modules in Plugins to implement the visualization.

    • Event Processing: Loads the data from Data Log module and processes the data to be used by Application module

  • Plugins: Provides the core features of TensorBoard, process the results of event processing and provides APIs that can be used by Application module

    • Audio: Responsible for visualization of audio recognition

    • Debugger: Offers a graphical interface of the TensorFlow debugger, allows users to pause and resume execution and visualize values of tensors over time, associate tensors with python code

    • Distribution: Another way to visualize histogram data

    • Graph: Displays the computational graph of the TensorFlow model

    • Histogram: Shows how the statistical distribution of a Tensor has varied over time

    • Image: Displays the images that saved in TensorFlow

    • Projector: Visualizes data embedded in a high-dimensional space

    • Scalar: Visualizes scalar statistics that vary over time, such as model's loss or learning rate

    • Text: Display text snippets save in TensorFlow, supports features such as hyperlinks, lists, and tables

  • Web Server: Uses (Web Service Description Language) WSDL to form TensorBoard web pages of the browser

  • Internet Browser: Users can view web pages of TensorBoard by using browser

4.3 Context Diagram

The TensorBoard context diagram is quite simple. TensorBoard interfaces directly with only two other systems, a browser which is served web pages from the TensorBoard server, and an event file that contains the output that was specified within the TensorFlow model. These TensorFlow run summary event files contain all the required data for TensorBoard to render model graphs, data source visualization, and classification comparisons. Together these systems work together to provide a low latency and intuitive interface for the end user, separate concerns between clearly defined interfaces at all stages, and enable the model and classification introspection that makes TensorBoard useful.

Figure 4.2: Context view showing the interaction of external entities and their usage dependencies

4.4 Sequence Diagram

The sequence shows that the event data was preprocessed before display, so when user opens the TensorBoard page, the result graphs will be displayed in a short time. In this diagram the two-stage flow experienced by the user between using TensorFlow and TensorBoard is highlighted by the discontinuity in the sequences. We see that the event log is used to pass information between TensorFlow and TensorBoard, and how that data is used to render a useful view for the user. This process iterates until the user sees their neural network perform as desired.

Figure 4.3: Sequence diagram of TensorBoard outlining that the event data was preprocessed before being displayed to the user

4.5 Rationale

The main QAS described in the primary presentation is the usability that TensorBoard can completely analyze and visualize the result data from multimedia data training of TensorFlow. After TensorFlow save different types of data to the Data log, Event Processing module in TensorBoard process this data. Then the data can be used by Plugins module and shown in the Internet Browser.

The second QAS related to the primary presentation is the maintainability that TensorBoard can be accessed and run without the influence of TensorFlow. As shown in the diagram, the module of TensorFlow and TensorBoard are separated, so the change of TensorFlow will not influence TensorBoard. The only interaction is that TensorBoard uses the data from the Data Log module which is generated by TensorFlow. If TensorFlow sends new data to Data Log module, user can load the new data if they want to visualize the new dataset.

TensorBoard was created as a suite of visualization tools to make it easier to understand, debug and optimize TensorFlow neural network models by revealing their internal structures and processes. The implementation units and responsibilities are focused on the properties of four main quality attribute scenarios:

  1. Can be accessed and run without the influence of TensorFlow with 99% accuracy under normal circumstances. (Manageability/Maintainability)

  2. A fault occurs, and the system is able to catch and handle the exception, while continuing on with the processing of data. A log is generated, and functionality continues. (Supportability/Reliability)

  3. A user has a TensorFlow model which contains. proto file, TensorBoard can provide API to analyze the data in one second. (Interoperability)

  4. A user loads data form TensorFlow, the summary of data is shown less than 0.05 seconds. A user trains against 55000 MINST items and the display time is less than three seconds. (Performance)

QAS 1 addresses the separate management of Tensorflow and TensorBoard under separate repositories on GitHub to allow computation using data flow graphs for machine learning to be handled with Tensorflow, and the visualizations in a toolkit, TensorBoard. To address QAS 2, TensorBoard can log events interactively, as well as monitor output over time to address things like learning rate, loss values, and testing accuracy. TensorBoard operates by reading a summary of data generated by TensorFlow known as event files. Compacting the relevant information in an easy to feed format for TensorBoard.

Open source software is peer-reviewed, lending software systems to be more reliable, diverse, and increase the usage of the tool, as it’s free and widely available. The plug-in system of adding separate features allows for easy customization and modifications. ASR for QAS 1 describes that the maintenance of TensorBoard must be independent of TensorFlow, i.e it could be installed and maintained separately, as seen in the primary presentation diagram. Separate entities allow for separate maintainers and management, which allow for compartmentalized support from maintainers, developers and support staff.

TFRecords file contains protocol buffers, a recommended standard format for Tensorflow and TensorBoard. To mix and match data sets and network architectures, this standard format allows TensorBoard to provide API to analyze the data given, as described in QAS 3. Having a standardized format is integral in the visualization of complex computational graphs, and that format is easily interpretable and found in the structure that contains the necessary event processing.

Assumptions and constraints on TensorBoard’s architectural approach include knowledge of and reliance on TensorFlow. The tool was originally built as a compliment to Tensorflow and as such you can assume the users of TensorBoard to be users of Tensorflow in conjunction.

5.0 Component and Connector View

The component and connector (C&C) view represent elements that have a runtime presence. This view outlines the major pathways of interaction granted throughout the project. Components will have ports that allow interaction with them through connectors, and the component itself may represent complex subsystems. Pathways such as communication links, protocols, information flows, and access to shared storage are explicitly shown which provides rapid reference to the flow of information within a project.

5.1 Primary Presentation

Figure 5.1: TensorBoard latency QAS as seen through a pipe and filter view

5.2 Element Catalog

5.2.1 Elements and Properties

  • TensorFlow Event Data: Event data summaries are saved in the event file directory by TensorFlow. This is a directory in the operating system file structure that gets populated with summary files which TensorBoard polls continuously. The data is then read by the event generator of TensorBoard.

  • Event Generator: The event generator searches and loads a sequence of paths with event files. It only loads the data from one path at a time. After all the events in the path are loaded, the generator will move on to the next path. After the first loading, this element will only reload the new data when users update the event file rather than load all the data again, which helps to reduce the processing latency. This is the entry point of data into TensorBoard.

  • Event Processor: The responsibility of this filter is processing the event data that has been loaded by the generator. Before processing event, this filter will purge the orphaned data in pending data. Then it breaks the data based on the plugin types, such as graph, histogram, etc. After the decomposing, different plugins can be run at the same time. The total time to show the data will be reduced because of parallel execution of these processes.

  • Graph Implementation: This filter transforms the data into a form that can be shown in the graph UI, and filters out attributes that are too large to be shown in the UI, which helps to ensure the graph can be shown within an acceptable time. This is the exit point of data from TensorBoard’s event processing pipeline.

  • Graph Data for Visualization: The data is ready to be visualized in the UI, and is sent to the browser via sockets.

5.2.2 Relations and Properties

The relation in the pipe-and-filter view is an attachment between pipes and filters. It is as shown in the primary presentation as arrows describing data passing from the output of elements to the inputs of interfaces of subsequent elements in the data processing pipeline. In each filter, the pending data is filtered to decrease latency by removing unnecessary or unusable content, and optimized for memory consumption. The data flow is one directional from TensorFlow output files, through the data processing pipeline defined within TensorBoard that converts and filters it to be consumed by visualizations, and then rendered within the client browser in various graphs. Data formats traversing each pipe section changes depending on the stage in the processing pipeline, and the visualizations being supported.

5.2.3 Elements Interfaces

Loading Events from a Directory File

This interface handles reading in events from output files from TensorFlow. Summative event information is output into the directory by TensorFlow, and automatically picked up by the TensorBoard directory watching system. This intelligent file watching system will discover files as they are created in the directory, read them, and queue the events such that they are only yielded into the rest of TensorBoard once each.

  1. Interface Identity: Load Events from Directory

  2. Resources: Load()

For each resource:

  • Syntax: Python-style, DirectoryWatcher.Load()

  • Semantics:

    1. Precondition: Directory exists and contains valid event files.

    2. Postcondition: Yields all values that have not yet been yielded.

  • Errors Handling:

    1. DirectoryDeletedError() Notes that the directory has been deleted.

  1. Data Types and Constants: The data type is an event object containing data generated by TensorFlow such as a Python tensor format that contains scalars, histograms, audio, or images among others. No Constants.

  2. Error handling: DirectoryWatcher.Load() raises DirectoryDeletedError() if a directory has been permanently deleted. Control of the program is then returned to the calling function.

  3. Variability: This interface enables variability in a number of ways. The system can handle having files appear and disappear and react appropriately by ingesting or ignoring events respectively. The events themselves are allowed to vary in quantity and content, and the system will ingest all the events without duplication.

  4. Quality Attributes Characteristics: This function generates from the file source. Should this function raise an error, the overall performance of TensorBoard would greatly suffer.

  5. Rationale and Design Issues: This interface represents a straightforward approach to load data into the program. Performance gains could be realized utilizing multithreading to load in the data, though there would be a complexity tradeoff.

  6. Usage Guide:

Processing and Prepare to Load Events into Plugins

This interface handles the processing of TensorFlow events that have been loaded into TensorBoard by a DirectoryWatcher object. Orphaned events are purged and the remainder are categorized by their field and processed accordingly.

  1. Interface Identity: Process and Prepare to Load Events into Plugins

  2. Resources: _ProcessEvent()

For each resource:

  • Syntax: Python-style, _ProcessEvent(event)

  • Semantics:

    1. Precondition: A valid event must be passed as the single parameter.

    2. Postcondition: Event is either purged due to being an orphaned event by _MaybePurgeOrphanedData(event) or processed as a graph, meta_graph, tagged_metadata, or summary event.

  • Errors Handling:

    1. tf.logging.warn() indicates a new file_version for event.proto has been found. Overwrite with newest file_version.

    2. tf.logging.warn() indicates more than one graph event per run or a metagraph containing a graph_def was found. Overwriting with newest event.

    3. tf.logging.warn() indicates a more than one metagraph event per run was found. Overwriting with newest event.

    4. tf.logging.warn() indicates multiple metagraphs containing graph_defs but no graph events found. Overwriting graph with newest metagraph event.

    5. tf.logging.warn() indicates more than one "run metadata" event with tag was found. Overwriting with newest event.

    6. tf.logging.warn() indicates that the summary with tag is not associated with a plugin. Carry on with no action.

  1. Data Types and Constants: The data type is an event object containing data generated by TensorFlow such as a Python tensor format that contains scalars, histograms, audio, or images among others. No Constants.

  2. Error handling: Overall, there is no true error handling found within _ProcessEvent(self, event) since this function processes and queues events as required. Any error handling has been done by functions that call _ProcessEvent such as ensuring that the event passed in is a valid event. The function does make consistent use of logging all faults.

  3. Variability: The _ProcessEvent function is able to handle a large degree in the variability of the event object contents. The interface is able to accept an event object and elicit the type of event whether it be a graph, metagraph, tag or summary process them accordingly. The primary processing is the addition of a tag into a list. If the user knows that the amount of data sent to the client browser will be excessive, they can optionally throttle the maximum memory use through the size_guidance parameter which overrides the default of unlimited use. This helps reduce overloading the client and reduces interactive latency there.

  4. Quality Attributes Characteristics: The function call to _MaybePurgeOrphanedData(event) provides a means for performance gains as it effectively reduces the total number of events to be processed by removing any events associated with a failed TensorFlow run.

  5. Rationale and Design Issues: _MaybePurgeOrphanedData() is a logical implementation at this junction. Removing unnecessary data provides a cleaner and faster result.

  6. Usage Guide:

5.2.4 Element Behaviour

According to the elements we indicated above, the state chart describes the possible behaviour of the system. TensorFlow event files can be generated continuously with different runs. The Event Generator is a thread in TensorBoard which continuously reads the event files in short cycles. To reduce the latency, the Event Generator does not load the event files which have already been loaded. After the event files are loaded, the event processor will process the event data. In this process, it will discard the orphaned data which is generated by TensorFlow restarts. This process will screen out the valuable data and avoid processing useless data. Once the data is purged, the Event Processor will format the event data to data dictionaries. If the size of data exceeds the limitation, these large data will be labelled with an attribute to ensure the graph will not show the entire large dataset. This will greatly decrease the graph display time of complex training models.

Figure 5.2. View into the behaviour of TensorBoard Elements

5.3 Context Diagram

TensorFlow creates the summary data in its training process and saves the data into the events file directory. It then generates serialized protocol buffers and saves the data in a specified path. TensorBoard processes these event files and visualizes these serialized data of TensorFlow runs on the local browser.

Figure 5.3: TensorBoard within it runtime environment

5.4 Variability Guide

The system allows the addition or removal of plugins, such as Graph component to provide different views of training process. These include: histogram, distribution, scalar, etc. For example, consider training a convolutional neural network for recognizing MNIST digits. The goal is to visualize how the learning rate varies over time, and how the objective function is changing. It is necessary to extend a Scalar plugin in the system to process the scalar summary data from the event files for visualization. In this way, the system enables the user to vary the perspectives they have on their TensorFlow data. This is the primary way that the system architecture handles usage variability. Due to inherent flexibility of the plugin architecture used for the visualizations and event data types, it should be simple to support expanding the current set of supported plugins.

5.5 Rationale

TensorBoard was designed with an adaptive plugin based architecture because it requires a high levels of Interoperability, Maintainability and Performance. The flexibility and adaptability are necessary in terms of implementation platforms. These quality attributes directly impact our quality attribute scenario. A user loads data from TensorFlow, the summary of data is shown in less than 0.05 seconds. After exploring TensorBoard more thoroughly, we update our expectations to accurately reflect realistic latency performance. The refined quality attribute scenario is: A user loads data from TensorFlow, the summary of data is shown in less than 3 minutes.

  • Interoperability: the exchange of information and communication between TensorBoard and TensorFlow is essential. Although the system stays within internal party systems, the exchange and reuse of information is essential to the QAS, in terms of reducing the outsourcing from third parties.

  • Maintainability: the degree of ease to change a systems components, functions and features is important to the reiteration of issues. The plug-in based allows for a separation in event processing and thusly separating running tasks at any one time.

  • Performance: the responsiveness of TensorBoard to execute their various plug-ins allows for individual unparalleled functionality with unnecessary components, as well as parallel functionality of necessary tasks. At the same time Process Event breaks down data based on plugin type and run different necessary and related plugins to reduce overall processing time.

A major component that directly reflects latency is the Content component and Limit Graph Size component that contains protocol buffers, and filter elements to ensure the size of TensorFlow input is manageable and not immediately influential to latency requirements. TensorBoard has some fault tolerant processes in place to avoid performance latencies. Purge event allows for a consistent view in TensorBoard by purging out of order events as a result of a TensorFlow crash.

TensorFlow is designed to preprocess that data being inputted into TensorBoard for the processing of the data before transferring to a local browser. For a given TensorBoard operation, TensorBoard receives the Event Data from TensorFlow and filters through three main components:

  • Event Generator: loads the data from one path at a time. This is done to reduce latency by only requiring the user to update the event file(s) as oppose to loading all the data again. This was also made as a conscious effort to the iterative process of data manipulation and optimization and the fluidity between TensorFlow and TensorBoard.

  • Event Processor: filtering the processing of the event data by decomposition into various plugins. The ability for users to run plugins in parallel is preferable for latency requirements.

  • Graph Implementation: transforms data into a graphical UI representation. It filters attributes that are too large to be shown to ensure latency requirements are met. This UI decision is to improve UI and UX, and ensure a reliable and usable interface and experience.

The data summarization relevant to these system areas must occur in under 3 minutes to fulfill our quality attribute scenario and the expectation of users. Assumptions and constraints on TensorBoard’s architectural approach include knowledge of and reliance on TensorFlow. The tool was originally built as a compliment to TensorFlow and as such you can assume the users of TensorBoard to be users of TensorFlow in conjunction. The current implementation is optimal for the high functionality of TensorBoard and in terms of our QAS.

Please note that we assume: As TensorBoard contains various plugins, this documentation is based on the assumption that the user only shows Graph plugin which is one of the core visualizations.

6.0 Technical Debt

Technical debt is what results when you make short term decisions when long term consequences in software development. It can include software architecture decisions, testing being left undone, and lack of documentation among numerous other types. The following section describes technical debt results discovered by manually and programmatically analyzing the TensorBoard software project.

6.1 SonarQube Analysis

During the analysis, many exclusion words for Pylint were found in the code of TensorBoard, such as:

pylint: disable=arguments-differ

pylint: disable=line-too-long

As a result, it is believed that the development team has utilized Pylint as the preferred code quality tool to assess the code. Therefore, the use of another tool SonarQube was employed to scan the static code to identify vulnerabilities that Pylint did not detect. Here is a link to our live cloud instance: https://sonarcloud.io/organizations/nathansun1981-github/projects .

Compared to other GitHub Open Source software, TensorBoard contains high quality code. According to the static code analysis, there are no bugs and few vulnerabilities. Though there are some code smells, they are much lower than what is found in other software. The major problem with TensorBoard is that it has duplicated code. Both SonarQube and Sonar Cloud were used to analyze TensorBoard, these tools provided the same conclusion. Moreover, both of the integrated checking rules, Sonar way and Pylint, were used but each provide slightly different results. To classify the existing problems the results of the scans are analyzed.

Figure 6.1: Report from local SonarQube Server for all projects

Figure 6.2: Report from SonarCloud for all projects

Figure 6.3: Report from Local SonarQube Server for TensorBoard

6.1.1 Function Names

One of the most significant problems in the TensorBoard code is that the function names are arbitrary. They do not follow proper naming conventions. If the names are not meaningful, other developers will spend more time looking into the code to determine the exact function of certain methods. This has a cost on modifiability and maintenance, and increases the possibility of misunderstanding and making mistakes.There should be a regular standard to unify the names of functions. Shared coding conventions allow teams to collaborate efficiently.

For example, plugin_event_accumulator.py::Graph() does not match the proper naming conventions, it should be renamed as generate_graph() to clarify the meaning of this method for the other developers or tester. Another example, plugins/audio/summary.py::pb(), is very difficult for other developers to understand the meaning and function of pb if there are no comments provided.

A similar problem is that some functions, methods and lambdas in TensorBoard have is too many parameters. A long parameter list can indicate that a new structure should be created to wrap the numerous parameters, or that the function is doing too many things.

For example, the method op() in plugins/audio/summary.py has 9 parameters and the method pb() in the same file has 8 parameters. The developers should define an array to reduce the parameters.

6.1.2 Methods and Field Name Capitalization

Looking at the set of methods and fields in a class and finding two that differ only by capitalization is confusing to users of the class. This situation may simply indicate poor naming. Method names should be action-oriented, and thus contain a verb, which is unlikely in the case where both a method and a field have the same name (with or without capitalization differences). However, renaming a public method could be disruptive to callers. Therefore renaming the member is the recommended action. For example, rename the method "Items" in some explicit way such as "GetItems()" to prevent any misunderstanding or clash with the variable "items" appearing in context.

6.1.3 Cognitive Complexity of the Code

Cognitive complexity is a measure of how hard the control flow of a function is to understand. Functions with high cognitive complexity will be difficult to interpret and maintain.For example, in the method projector_plugin.py ::_augment_configs_with_checkpoint_info(), there are four for-loops, and three of them are embedded in other loops. This deep nesting greatly increases memory consumption and makes the code less understandable and more difficult to modify.

6.1.4 Merging Collapsible "if" Statements

Merging collapsible if statements increases the code's readability. For example, in plugin_event_multiplexer.py::AddRun(path, name=None), merge the if statement with the nested one.

Noncompliant Code:

if accumulator:

   if self._reload_called:

       accumulator.Reload()

Compliant solution:

if accumulator && self._reload_called:

   accumulator.Reload()

6.1.5 Reducing Code Duplications

There is substantial duplicated code in the project. When code with a software vulnerability is copied, the vulnerability may continue to exist in the copied code. If the developer is not aware of the vulnerability in the copied code, they will have just introduced it into their code. Refactoring duplicate code can improve many software metrics such as: lines of code, cyclomatic complexity, and coupling. This may lead to shorter compilation times, lower cognitive load, less human error, and fewer forgotten or overlooked pieces of code. For example, def GetLogdirSubdirectories(path) is duplicated in plugin_event_multiplexer and event_multiplexer. This function does not rely on a class, so it can be refactored so that both plugin_event_multiplexer and event_multiplexer can use it.

However, not all code duplication can be refactored. The risks of breaking code when refactoring may outweigh any maintenance benefits. Duplicated code does not seem to be significantly more error-prone than unduplicated code. Duplications can also be reduced using open source technologies for sharing code components instead of duplicating them between SCM repositories. For example, def AddRun(self) is duplicated in plugin_event_multiplexer and event_multiplexer, because this function relies on the input self which depends on the class, this function cannot be refactored.

Another problem with the code duplication in TensorBoard is that two branches in a conditional structure should not have exactly the same implementation.

For example, in plugins/debugger/events_writer_manager.py :: write_event(event)

    if not tf.gfile.Exists(file_path):

        self._events_writer.Close()

        self._events_writer = self._create_events_writer(

           self._events_directory)

    elif tf.gfile.Stat(file_path).length > self._single_file_size_cap_bytes:

        self._events_writer.Close()

        self._events_writer = self._create_events_writer(

            self._events_directory)

Having two branches in the same if structure with the same implementation is at best duplicate code, and at worst a coding error. If the same logic is truly needed for both instances, then they should be combined. The better way is either merging this branch with the identical one or changing one of the implementations.

6.2 CodeScene Analysis

CodeScene identifies patterns relating to the evolutionary timeline of the code. This allows for insight to predict future faults and potential technical debt. The analysis allows the prioritization complex code, factor in bottlenecks, maintenance issues and future limitations.

6.2.1 Hotspots

A Hotspot map identifies system packages (large blue circles), and provides a visual hierarchy for smaller packages inside (Figure 6.4). Hotspots identifies maintenance problems, and areas that have an increased interest in code review. Hotspots are indicative of code churn, evaluating the rate of code evolution. The reported code churn for vz-projector.ts. is 784%. The high percentage can be interpreted as an area of the project with the highest rate of return. This analysis allows us to prioritize highly employed files and algorithms within our project. Hotspots are a preliminary metric to finding frequent development activity to narrow down specific areas for future development and focus.

Figure 6.4: Hotspots report from CodeScene identifying the modules with most development activity

6.2.2 Refactoring Targets

The highlighted files take into consideration what aspects could be altered to give the largest return (Figure 6.5). A prioritized list of files is given in an ordered list of files that have the highest technical debt interest rate, and the highest return (Figure 6.6). The information in the refactoring targets is identical to the Hotspot.

Figure 6.5: Refactoring Target report from CodeScene

The highest priority file is currently vz-projector.ts. The function Projector.setupUIControls has the highest complexity/size at a value of 50 (Figure 6.6). Ideally, the complexity and lines of code should grow in a matching, linear trend to ensure the maintenance of code is upkept. In Figure 6.7, the constant state of the revisions indicates minor modifications as opposed to refactoring that would have a large peak amist relatively level revisions and complexity. While deteriorating code will have an increase in complexity and revisions.

Figure 6.6: X-Ray File Results from CodeScene

Figure 6.7: Function Complexity Trend from CodeScene

Minor modifications ideally are done to stabilize the codebase, but with small changes includes the risk of small changes being expensive and potentially high in risk. With reference to these minor changes, TensorBoard developers need to give special attention to pieces of code that may be continually changing, as they can be a signal that the problem domain is not fully understood, or the code is failing to model the problem well.

6.2.3 Code Age

The Code Age in CodeScene is defined as "the time of the last change to the file", indicating the stability of the code. As shown in the following diagram, most files in TensorBoard are unstable shown as darker bubbles, which means most files have been modified recently (Figure 6.8). The low stability results in higher cost of maintainability, because stable code could be extracted as packages and minimize the cognitive load for developers. However, the developer of TensorBoard have to understand much more code rather than understanding the API. Also, this unstable code could lead to more test work for and increase the length of the delivery cycle of the system.

Figure 6.8: Code Age Report from CodeScene

6.2.4 Internal Temporal Coupling

Internal temporal coupling refers to modules changing together over time, describing change patterns. An example of high internal coupling is in vz-projector-bookmark-panel.ts. The highlighted code below has a 37% similarity between Projector.getCurrentState and Projector.loadState, and the highlighted portions show differences between the coupled files. Both files run two identical for loops, and can be seen as a duplication that can lead to increased technical debt in terms of run-time and efficiency.

Figure 6.9: Function comparison to identify the internal temporal coupling

6.3 Technical Debt Recommendations

As mentioned previously, TensorBoard code quality is high for an open source project. This is somewhat expected from Google projects, but even strict automated code analysis tools are having a hard time finding major issues or presenting recommendations. The majority of recommendations are developer facing improvements, with a focus on code maintainability.

6.3.1 Duplications and Refactoring

Both SonarQube and CodeScene reveal the duplication issue within TensorBoard code.

Technical debt: There are many files and lines of code that are duplicated in this project which increase the complexity. If one file needs to be modified, it is very likely that the duplicated file needs the same modification. Furthermore, the duplicated files increase the change of error. For example, the developer may only modify one file and forget about the other one, then the test time could be longer to figure out this error. Even worse, they fail to detect this error, which will negatively influence the user experience.

Recommendation: If possible, these duplicated files should be refactored and extracted as APIs or packages to reduce the cost of maintainability. This would allow developers to focus less on what gets refactored into APIs or packages, reducing the cognitive load.

6.3.2 Function Recommendation

According to SonarQube, the code has some issues related to function name and input parameters. For example, the methods and field names are too similar, the function names do not match regular expression, and there are too many parameters for the methods and lambdas.

Technical debt: Some methods and field names are only differed by capitalization, it could be misunderstood and misused by developers, especially the ones who just join this project. The mismatch of regular expression of the function names could lower the efficiency of collaboration. In addition, too many parameters indicate the function contains too many things, which is harder to be maintained than small functions.

Recommendation: The methods and field names could use differents word and also match a regular expression. The parameters could be reduced or extract as structure. Alternatively, the functions could be refactored as several small parts.

6.3.3 Stability

According to the Code Age, Hotspots and Refactoring Targets result from CodeScene, the code of TensorBoard has low stability and most code are modified recently.

Technical debt: In general, the code needs to be as stable as possible. The low stability means the development team need to maintain a working knowledge of those code for the life-time of the system. Furthermore, if the automatic test are used in this project, the unstable code lead to unstable test, because some stable code don't have to be tested in every build.

Recommendation: A possible way to enhance the stability is to extract and refactor the code in order to provide more APIs and packages, and then improve the cognitive load for developers.

6.3.4 Absence of a Dataset Module

According to issues 1002, 1013 ,766, and 49, in TesorBoard’s issue list, the loads of runs with large event files lead to memory consumption. TensorBoard currently loads all runs into memory rather than using database management system, e.g. SQLite DB.

Technical debt: the performance of this project is reduced. One the one hand, it leads to the latency, on the other hand, it may cause memory leak. In the future, as the number of user increases, the memory consumption issue will become more distinct which could force the TensorBoard team use the database management system instead of using memory.

Recommendation: A possible way to solve this problem is to add database module in the system instead of using memory.

6.3.5 Compatibility of Multiple Browsers

There are many issues related to Firefox. For example, scalar charts become distorted on Firefox after toggling expand and log scale. Also, some code particularly handle with Firefox. For example, the code shown below did a simulation of Chrome's outer glow on Firefox.

    Firefox: Simulate Chrome's outer glow on button when focused. 

    button:-moz-focusring {
        outline: none;
        box-shadow: 0px 0px 1px 2px
        Highlight;
    }

Technical debt: when developers write the similar code, they need to consider the particular situation for Firefox every time, and it increases the cost of maintenance and also makes the duplicated code.

Recommendation: Ideally, different browsers need to use the same code. The particular code for different browsers should be limited as much as possible.

6.3.6 "TODO" Search and Analysis

Searching the TensorBoard repository for the "todo" string yields these results. They include 77 in-code instances which appear to be unresolved. They include suggestions that are comical, but also indications that there are many unwritten tests, but there are two design related issues.

  • This todo notes that unless the referenced code is captured within a try catch there is a chance of crashing long running user scripts. Left unresolved, this technical debt decreases reliability of the application for users in cases of running complex tasks and visualizations simultaneously.

  • This todo indicates that the TensorBoard debugger system is not able to handle JSON exceptions in a library specific way, and instead performs a catch-all. The todo calls for creation of platform dependent exception catching modules or plugins, which would increase both the robustness and clarity of exception handling in diverse deployment environments. Left unresolved this technical debt could hinder developers ability to access or understand errors that are produced debugging issues in JSON objects.

6.3.7 Other Recommendations

  • For WSGI, the default is HTTP/1.0 which is an interesting design choice based on the contextual tradeoff. The tradeoff is TensorBoard development team must always specify the Content-Length header, or do chunked encoding for streaming. In the future, to improve the performance, if the team want to update the version, the compatibility of different version need to be considered

  • When a user re-runs a model with a different learning rate or layer size the data from the previous run is discarded by TensorBoard's purging logic. This logic could speed up the reading processing and reduce duplicated data, but this just ideally works. Actually, in most real tasks, It's very natural to write code that writes multiple runs to the same run directory, for example because you have code that explicitly specifies a log directory, and you run that code multiple times. Empirically, it looks like TensorBoard silently discards some of the data so as to show only one run. There should be a better strategy. A suggestion is to change TensorBoard so that it automatically detects when the run has restarted, and separates it into a new run.

6.4 Technical Debt Summary and Conclusions

The technical debt analysis identifies key areas in a software that can be optimized to improve aspects of a project. Code duplication, function refinement, stability in code changes over time and browser compatibility issues are the main areas of technical debt analyzed in TensorBoard. The development of TensorBoard has likely been a quick paced, complex and laborious project and complete optimality through the entire project is not expected. Although listed are the main issues found, they are considerably smaller issues than one would expect with a project of this size and complexity.

7.0 Pull Request

The pull request link is as follows:

https://github.com/tensorflow/tensorboard/pull/1109

7.1 Problem Description

According to the report of SonarQube and CodeScene, duplication is one of the main issues causing the technical debt as the expansion of this project. In addition, SonarQube reports some conditional structures contain two branches with same implementation. The pull request tackles these two issues.

Figure 7.1: High Duplications before refactoring

7.2 Solution

The duplication rate of this project reported by SonarQube is 9.2%. The main duplicated files are event_accumulator.py and plugin_event_accumulator.py with 40.1% duplication rate, as well as event_multiplexer.py and plugin_event_multiplexer.py with 67.5% duplication rate. The main reason is that the duplicated files contain the same class, and most of the functions defined in the class are the same or have a high similarity. Also, the duplicated files are used under different conditions. This problem can be solved by class inheritance, which can eliminate the same functions and rewrite the similar functions. Therefore, one class was defined as the child of the other class, and the duplicated functions were deleted in the child class. Also, functions that have same names but have different implementation were kept. Furthermore, the related files that call these classes accordingly to ensure the validity of this project were modified.

Figure 7.2: High Duplications in files before refactoring

In addition, json_util.py and events_writer_manager.py contain the conditional structure that has two branches in the same if structure with the same implementation, sothe duplicated branches were removed.

After finishing all the changes, SonarQube was used to scan the project. The duplication rate of the project decreased to 2.5%. The duplication rate of plugin_event_accumulator.py and plugin_event_multiplexer.py were reduced to 21.4% and 17.5% respectively.

Figure 7.3: Reduce Duplication after refactoring

With the modified code,TensorBoard was successfully run using MNIST data. Since there are some test files in TensorBoard, the test files were also run in the local environment and all of them passed. However, after submitting pull request, 50 tests passed and 1 test failed in Travis CI (Continuous Integration).

TensorBoard’s team provided a quick response to the pull request. They declined this pull request because the parallel files are to maintain backwards compatibility for code within Google. Also, they claim that the current setup is suboptimal and they will redesign TensorBoard’s backend to obviate the duplicate logic in the future.

8.0 Conclusions

This chapter summarizes an analysis of the TensorBoard architectural structure and components through high-level documentation. TensorBoard’s visualization tool set created a solution for those looking to optimize their TensorFlow programs and enrich their analysis through a suite of tools.

TensorBoard’s module development approach allowed for an immersive insight into the project through digestible layers. The analysis of stakeholders, business goals, architecturally significant requirements gave an understanding to the motivation and high-level goals of the development of this open source software through the lens of the Google team. Generating the module view, component and connector views and analyzing technical debt on a well-developed and mature project gave insight to the specific attributes of the project. TensorBoard is a popular visualization tool, and with the issues discussed above, will continue to be utilized in the field of neural networks and visualization.

Note: If you are considering joining the TensorFlow and TensorBoard community and make contributions, please do! This chapter together with the contribution guidelines can assist in your participation and contributions to the community.

9.0 References

[1] Mané, Dandelion. "Tensorflow/TensorBoard." GitHub, 23 May 2017, https://github.com/tensorflor/tensorboard/blob/master/LICENSE.

[2] Mané, Dandelion. "Tensorflow/TensorBoard." GitHub, 5 October 2017,https://github.com/tensorflow/tensorboard/blob/master/DEVELOPMENT.md.

[3] Mayes, Robin James, Pamela Scott Bracey, Mariya Gavrilova Aguilar, and Jeff M. Allen. "Identifying Corporate Social Responsibility (CSR) Curricula of Leading US Executive MBA Programs." Handbook of Research on Business Ethics and Corporate Responsibilities (2015): 179-195.

[4] "TensorBoard: Visualizing Learning", https://www.tensorflow.org/get_started/summaries_and_tensorboard.

[5] "Companies using TensorFlow.", https://www.tensorflow.org/.

[6] martinwicke. "Tensorflow/TensorBoard/README." GitHub, 12 Jan 2018, https://github.com/tensorflow/tensorflow.

[7] "Insights:contributors", GitHub, https://github.com/tensorflow/tensorboard/graphs/contributors.

[8] Chen, Lianping, Muhammad Ali Babar, and Bashar Nuseibeh. "Characterizing architecturally significant requirements." IEEE software 30.2 (2013): 38-45

Last updated