Verification of Autonomous Systems Working Group Meeting

NRL, April 16-17, 2014

The original purpose of this meeting was to determine whether or not there existed a mathematical space that could ameliorate some of our state space problems. We ended up with a list of open research challenges (the starting point for the Challenges & Tools page). The report from that meeting is included below.

Meeting Report

Summary

The first meeting of the new Verification of Autonomous Systems Working Group held its kickoff meeting at NRL on April 16 and 17, 2014. At this meeting, the group concluded that there were extensive research challenges in this area and determined that a multi-disciplinary group was required in order to effectively address them. Participants agreed on common definitions for a number of potentially contentious terms and began figuring out how they want to address their common problems.

Terminology

The terms defined below are the default definitions within the group. These are not intended as definitions to be propagated outside the group, merely as ways to simplify and streamline discussion within the group. In discussions, these will be the default definitions. Any of these terms may be defined differently for the purposes of any given discussion/contribution, as long as the person redefining it lets the rest of the group know.

The terms defined at this meeting are the default terms in the Terminology Wiki.

Additional terms that were defined as part of the discussions included autonomy (defined as the freedom to choose to do something the operator may not have done, where if the mission is completed, the system worked) and cognitive element (which in the validation world is more related to the human side of the equation, but in the robotics world refers to the various elements that make up the perception and decision-making aspects of the robot). Users and operators are not the same: a user controls the system without understanding the underlying technology or decision-making capability, while an operator understands the underlying complexity and utilizes it. Users may become operators as they define new tasks for the system to perform that the original designers did not envision – in that case, the users are operators for the purposes of that task.

Key words identified as critical to the ways in which current verification tools do not adequately address autonomous systems include uncertainty, complexity, and unstructured. The robots operate in unstructured, complex environments, the correct action for the system to take at any given time is uncertain, and the interaction between the environment and the system is intractably complex.

Interdisciplinary Awareness

Several scenarios were identified and used as explanatory examples to highlight the complexity of the problem and help define the terminology.

Scenario 1: A robot that will go to the store and bring back a gallon of milk. This enabled the group to begin exploring different aspects of the problem by highlighting the nature of implicit and explicit tasks (“get a gallon of milk” is an explicit task, “cross the street” is an implicit task that may or may not be part of the design specification depending on the path to and from the store, “don’t run over people” is an implicit task that should always be active, and should be caught as part of the initial validation process). There are several aspects of the problem that need to be addressed, including whether we can verify that a robot has enough information to make a decision and how well characterized the sensor is.

Scenario 2: A team of robots (or single robot) that will survey an area, where the survey consists of a low resolution long range survey to identify interesting objects and a secondary step producing high resolution imagery of just those objects.

Goals

Potential problem descriptions:

Trust

Anecdotal evidence indicates that the tipping point from distrust to trust is disturbingly narrow and steep. When presented with a functioning system operating safely in a constrained environment, individuals who distrust that system based on hearsay will exhibit a binary shift into a level of trust that is completely undeserved. They will go from wary to wandering around in front of it in a matter of minutes.

This problem is complicated by the need to disambiguate between systems that are untrusted because of some element of operator interaction or existing bias and systems that are untrusted because they are simply not good enough at the task (they are not trustworthy).

Consensus

The group agreed that:

The group also concluded that while it may be appropriate to require system developers to adhere to design practices that make verification easier (or even possible), that in practice it is impractical – developers will focus on effectiveness and performance first, and verification to the extent necessary but no further.

Research Challenges Identified

Several major research challenges within this area were defined, covering the development and debugging of models, how to determine an appropriate level of abstraction, new tools and techniques that must be developed, and how to test these new tools and techniques.

(Core problem: What is ENOUGH evidence to say that something is verified?) Running a few cases is not sufficient to verify something (unless those cases can be proven to encompass the range of situations and behaviors that need to be tested to characterize the system or its properties), but when a statistical or sampling approach must be taken, how many samples is enough? In cases where evidence is not quantitative, what is the basis for setting the threshold for enough evidence? (Existing regulatory authorities have thresholds for determining when a system has passed a certification process, but the foundations for those thresholds are murky at best. Fundamentally, we don’t know why these existing processes work, so we can’t develop new ones that include the currently unworkable cases.)

(Design) Does designing a system to be modular actually help long term supportability, in the sense that verifying the new module and its interactions is a less intensive and complex process than having to verify the entire system? This seems like it ought to be true for less complex automated systems, but is it still true for systems that are already intractably complex? In practice, research into this area has found that while theoretical approaches that satisfy certain assumptions do find a benefit, there are significant impediments to implementing it in practice. The biggest software problems are at the module interfaces, where assumptions about the system change as different module creators submit their creations. Defining the underlying assumptions within the interactions between the modules that cause the problem is unsolved.

Current Approaches

The group noted several potential approaches to some of these problems:

In all of these cases, we need a way to define relationships between what we want it to do and what it actually does. Equally, we need a way to ensure that the developers are not writing the tests their system will be subject to.

Including verification at design time can improve the developers’ understanding of the system and help create more efficient designs.

Way Forward

Several next steps were identified: