research-article

Open access

Towards a Greater Understanding of Coordinated Vulnerability Disclosure Policy Documents

Authors:

Thomas Walshe,

Andrew SimpsonAuthors Info & Claims

Digital Threats: Research and Practice, Volume 4, Issue 2

Article No.: 29, Pages 1 - 36

https://doi.org/10.1145/3586180

Published: 10 August 2023 Publication History

PDF eReader

Abstract

Bug bounty programmes and vulnerability disclosure programmes, collectively referred to as Coordinated Vulnerability Disclosure (CVD) programmes, open up an organisation’s assets to the inquisitive gaze of (often eager) white-hat hackers. Motivated by the question What information do organisations convey to hackers through public CVD policy documents?, we aim to better understand the information available to hackers wishing to participate in the search for vulnerabilities. As such, in this article we consider three key issues. First, to address the differences in the legal language communicated to hackers, it is necessary to understand the formal constraints by which hackers must abide. Second, it is beneficial to understand the variation that exists in the informal constraints that are communicated to hackers through a variety of institutional elements. Third, for organisations wishing to better understand the commonplace elements that form current policy documents, we offer broad analysis of the components frequently included therein and identify gaps in programme policies.

We report the results of a quantitative study, leveraging deep learning based natural language processing models, providing insights into the policy documents that accompany the CVD programmes of thousands of organisations, covering both stand-alone programmes and those hosted on 13 bug bounty programmes. We found that organisations often inadequately convey the formal constraints that are applicable to hackers, requiring hackers to have a deep understanding of the laws that underpin safe and legal security research. Furthermore, a lack of standardisation across similar policy components is prevalent, and may lead to a decreased understanding of the informal constraints placed upon hackers when searching for and disclosing vulnerabilities. Analysis of the institutional elements included in the policy documents of organisations reveals insufficient inclusion of many key components. Namely, legal information and information pertaining to restrictions on the backgrounds of hackers is found to be absent in a majority of policies analysed. Finally, to assist ongoing research, we provide novel annotated policy datasets that include human-labelled annotations at both the sentence and paragraph level, covering a broad range of CVD programme backgrounds.

1 Introduction

The growing popularity of Coordinated Vulnerability Discovery (CVD) programmes amongst organisations looking to outsource components of their vulnerability discovery process has been well studied in academic literature [22, 91, 99]. Through the operation of such a programme, organisations open up their assets to a global community of white-hat hackers (hereafter referred to as hackers), in the hope of capturing the gaze of skilled individuals who may go on to discover and disclose complex security vulnerabilities [12], thus benefiting from the maxim “Given enough eyeballs, all bugs are shallow” [65].

While the term hacker may still conjure up thoughts associated with the nefarious activities of malicious actors, either in film or reality [27, 70], thousands of individuals are regularly involved in the search for vulnerabilities for legitimate purposes [93], routinely helping to secure systems and data in open source and commercial projects. Driven by altruism [90], economic incentives [49], or a desire to gain real-world technical experience [29], these individuals perform valuable security research without malicious intentions, instead often disclosing the details of discovered vulnerabilities to the owner, with the ambition of assisting the organisation in securing its assets or sometimes aspiring to be rewarded for their efforts. Usage of the term hacker to describe those individuals engaged in socially beneficial¹ security research is contentious due to the negative connotations that surround the term. A literature review of information systems research by Oliver and Randolph [56] explored the definitions and connotations of the term as used in academia. The authors note that through continued discourse regarding the biases introduced by authors, researchers may lean towards more positive uses of the term, thus helping to shift away from the generally negative connotations that they observed [56]. Perhaps unsurprisingly, ‘hacker’ and ‘hacking’ is used positively across bug bounty platforms to refer to their user-bases (often used interchangeably with ‘researcher’ on Bugcrowd, ‘hunter’ on YesWeHack, ‘ethical hacker’ on Intigriti, etc.) and features in corporate branding and feature names (e.g., HackerOne, ‘Hacktivity’, ‘Hack the Pentagon’). Given the prevalence of positive associations of the term across commercial entities involved with CVD, and the expanded holistic definition of hacker proposed by Oliver and Randolph [56] that encompasses those who ‘benefit society’, we use it throughout the article, in a positive sense, to describe individuals engaged in legitimate security research.

Recent surveys have revealed insights into the backgrounds of those who participate in the search for vulnerabilities. A survey by Akgul et al. [4] included participant demographic information, identifying most to be young (18–29 years), male, and predominantly from North America, South Asia, or Western Europe. In Bugcrowd’s ‘Inside the Mind of a Hacker’ report,² a survey of the background of hackers who participate on the platform, it is reported that, of those that responded to the survey, 42% are full time and 26% are part time, suggesting that full-time participation may be viable for a large proportion of individuals on the platform. However, the viability of sustained participation of hackers on Bugcrowd is debated by Walshe and Simpson [93].

The concept of CVD programmes has grown in popularity as a mature security activity [71, 72, 74] that can prove to be a cost-effective security investment [22, 91, 99]. Bug Bounty Programmes (BBPs) and Vulnerability Disclosure Programmes (VDPs) represent two closely related, and sometimes overlapping, manifestations of CVD programmes that help facilitate the communication and interactions between hackers and organisations. As in many high-transaction cost markets in which information asymmetries exist [3], third-party intermediaries have emerged to aid both hackers and organisations in the operation of BBPs and VDPs. These bug bounty platforms offer a variety of products, services, and benefits to organisations operating a CVD programme, including assistance in programme setup and policy development, access to large user bases of active hackers, and vulnerability report verification [30]. For hackers, these platforms act as a centralised ecosystem in which organisations operate with seemingly similar postures towards security.

Although many platforms offer guidance on how to communicate the information necessary for effective and compliant participation in an organisation’s programme, there is a great deal of variation in the content of programme policy documents. For hackers (particularly those who are new participants), the differing language used across programmes, including those on the same platform, adds further complications that they must address. This can be particularly problematic if an individual does not fully understand the nuances of certain eligibility conditions, participation requirements, or legal clauses used across a range of programmes. As such, it is useful to further understand how key components and concepts are conveyed to hackers in the policy documents associated with a given CVD programme—the outcomes of which can be beneficial to both hackers and organisations looking to understand content and variability across CVD programmes.

The structure of the remainder of the article is as follows. The background of CVD programmes and a description of relevant previous studies are presented in Section 2, together with the motivation for the study discussed in this article. A detailed description of the data collection and modelling approaches is presented in Section 3. The results of the research questions introduced in the article are explored in Section 4, followed by further discussion in Section 5. The backgrounds of the organisations (both platform and programme operators) that make up the collected data leads to the results and discussion being predominantly U.S. focused. Finally, together with an outline of potential topics for future research, the contributions are summarised in Section 6.

2 Background and Motivation

In this section, we discuss the background of CVD programmes and the motivation for a renewed investigation into the content of the policies presented to hackers on programme pages. The use of bug bounty and responsible disclosure, collectively CVD, programmes within organisations, with reference to their inclusion in development frameworks, is explored in Section 2.1. After considering other prominent works concerning the imposition of constraints upon hackers, Section 2.2 places greater attention on the role of CVD policy documents in helping to guide the exchange of vulnerability information between organisations and hackers—of particular relevance is a study of HackerOne policies by Laszka et al. [42]. Having then identified the limitations of existing research, the motivation for the study and the set of research questions addressed throughout the remainder of the article are outlined in Section 2.3.

2.1 Background

Software development lifecycle frameworks outline the relationship between the vital stages that comprise the entire development process of an application [15], with such frameworks typically being sequential or iterative in nature [66]. Despite the growing need for secure products [36], there exist misaligned incentives between organisations and consumers as to the level of security investment that is desirable within the development process [35]. Secure software development lifecycles offer an alternative to, or an extension of, traditional software development lifecycles by encouraging security investment throughout the development lifecycle [8]. Not only does the continued consideration of security benefit the final product, but, by considering security in the earlier stages, the occurrence of costly bugs and flaws can be prevented—thus decreasing the cost of later fixes [64].

There are several published secure software development lifecycle frameworks that guide developers through the journey of the development lifecycle with appropriate consideration for security. These include the NIST SSDF (Secure Software Development Framework) [79], the OWASP SKF (Security Knowledge Framework) [58], Microsoft’s SDL (Security Development Lifecycle) [51], the BSA Framework for Secure Software (building upon the guidance from NIST’s SSDF) [85], and the BSIMM (Building Security In Maturity Model) developed and published by Synopsys [74].

In contrast to more theoretical frameworks, the BSIMM uses a data-driven approach to framework creation. In its latest incarnation, the security activities of 128 organisations are reviewed and compared to previous editions of the BSIMM [73]. This allows for the current popularity of a security activity to be quantified (and the growth or decline in popularity to be tracked), and for emergent activities to be added to the framework. Of particular interest to this work is CMVM (Configuration Management and Vulnerability Management) activity CMVM 3.4—the advocation for the operation of a BBP as a mature security activity. Although few specific recommendations are made, the inclusion of BBPs within the BSIMM, and the related growth in adoption over recent editions, demonstrates the increasing acceptance of CVD programmes as a viable security activity within organisations.

Within the aforementioned BSA Framework for Secure Software [85], organisations are encouraged to implement a policy that covers situations involving the disclosure of vulnerability information from a hacker to the organisation. The employment of a VDP is recommended for this purpose in Sections RV 1.3 [79] and VM 3.x [85], respectively. In ‘diagnostic statements’ (the equivalent of a BSIMM security activity) VM 3.1 through VM 3.5, the BSA framework outlines specific considerations that are relevant to an organisation’s VDP. Of particular relevance to this work is VM 3.3 [85]:

The vendor publishes, in simple and clear language, its policies for interacting with reports, addressing, at minimum: (1) how the vendor would like to be contacted, (2) options for secure communication, (3) expectations for communications from the vendor regarding the status of a reported vulnerability, (4) desired information regarding a potential vulnerability, (5) issues that are out of scope for the vulnerability disclosure program, (6) how submitted vulnerability reports are tracked, and (7) expectations for whether and how a reported will be credited.

Despite the framework highlighting the need for a clear policy alongside an organisation’s VDP, only seven criteria are listed and no accompanying example policies or clauses are provided.

As discussed in Section 1, third-party bug bounty platforms act as an intermediary between hackers and organisations operating a BBP or VDP that is hosted on the platform. As an alternative to a stand-alone programme, organisations may choose to host their CVD programme on a platform to gain visibility amongst a user base of active hackers [93], and make use of platform tools to ease programme management and operation (e.g., communication systems, payment systems, and user verification). As discussed by Walshe and Simpson [92], organisations may find these features and services burdensome to manage internally and opt to use a platform to help implement them, although some programme operators within the survey note the prohibitively high costs associated with platform usage. The inhibitive platform costs and a lack of perceived added value may lead organisations to continue to operate stand-alone CVD programmes [92]. For hackers, platforms provide the convenience of an integrated ecosystem, removing the hassles of repeatedly verifying identities and payment details, and needing to navigate differing communicating channels. In addition, platforms can act as a centralised repository for the constituent public and private programmes in which a hacker may participate.

2.2 Related Work

Although there has been significant research interest in recent years surrounding the use of CVD programmes, there has been rather less research covering the policies employed by organisations.

Earlier work by Kuehn and Mueller [41] explores software vulnerability markets, paying particular attention to BBPs through the lens of the theory of institutional economics of North [55]. In Institutions, Institutional Change and Economic Performance [55], North outlines the key concepts underpinning institutional economics and the effect institutions have upon the economic performance of a given system (e.g., with a country or market). Institutions (the “rules of the game” [55]) are comprised of both formal (rules that can be explicitly enforced, like laws) and informal (conventions, social norms, and codes of conduct) constraints that are imposed upon participants. The interplay between the formal and informal constraints helps structure interactions between individuals. Although the analysis by Kuehn and Mueller [41] does not extend the theory of institutions to cover bug bounty platforms, it may be prudent to briefly consider the present-day structure through this lens.

In a scenario involving a hacker, an organisation (operating a CVD programme), and a platform acting as the third-party intermediary, all are bound to transact and operate in accordance with the institutions. Relevant legal entities impose a set of formal constraints upon all parties. For example, U.S.-based hackers will be subject to the Computer Fraud and Abuse Act (CFAA) [81] and the Digital Millennium Copyright Act (DMCA) [83]—both of which are enforceable by the federal government—when conducting security research (activities conducted in certain U.S. federal assets may also be subject to the National Information Infrastructure Protection Act of 1996 [82]). As another example, U.S.-based organisations and the platform may be subject to the Anti-Money Laundering Act (AMLA) [84], export restrictions and sanction requirements when transacting with hackers—particularly if bounties are paid (for a further discussion of the complexities around the global flow of information and money in the context of CVD programmes, see the contribution of Zhao et al. [100]).

Both the organisation and the platform will impose additional informal constraints upon the hackers. Typically, platforms will require hackers to abide by a code of conduct to discourage undesirable behaviour (e.g., spam reports, reputation farming, and abusive communication). To enforce such rules, a platform typically has the power to ban or suspend user accounts. Organisations will impose constraints upon the hacker through programme policies that outline rules, guidelines, and conditions for eligible participation. However, unlike the platform in this scenario, the organisation can do little to directly enforce the informal constraints, aside from withholding any potential payments.

As highlighted by North [55], when the cost of transaction is prohibitively high (as it is in vulnerability markets [41, 52]) and enforcement of the formal constraints can be costly, the parties must rely on informal constraints for the market to function. Therefore, understanding these informal constraints is of particular importance. Kuehn and Mueller [41] identify four institutional elements that help facilitate a market: procedures, technical specifications, terms and conditions, and acknowledgments and reputation.

Work undertaken by Laszka et al. [42] is of particular relevance to the premise of this study. Laszka et al. [42] analyse the general statistics and policy documents of 111 CVD programmes (as listed on HackerOne in 2016). To further investigate the contents of the policy documents, the authors first qualitatively define a general taxonomy that attempts to understand the distinct components that organisations use to communicate programme rules. Consequently, a taxonomy of 12 components is constructed and used to classify sections of the 111 collected policy documents. The presence of the components is then investigated with respect to the reporting statistics of each programme. The authors argue that programmes “are on average associated with better success characteristics if the level of comprehensiveness of their rules of engagement increases” [42]. However, in the absence of causal analysis, it not possible to determine the impact of a component on the success (or otherwise) of a programme.

2.3 Motivation

The focus of the study described in this article is, in part, inspired by the qualitative component of the work undertaken by Laszka et al. [42], with a view to expanding the analysis of policy documents beyond the narrow focus of prior work. Furthermore, in consideration of the rules that organisations now include in a policy document, the lens of institutional theory is focused upon CVD programmes [41, 55].

BBPs sometimes offer eye-catching rewards to hackers for the discovery and disclosure of certain highly prized vulnerabilities [91], with some programmes advertising potential single-report payouts in the region of millions (and sometimes tens of millions) of dollars (e.g., the $10,000,000 Wormhole programme³). Although hackers may be motivated to search in a particular organisation’s assets, aspiring to earn lucrative rewards, they may not consider the implicit or explicit risks associated with participating in a programme, such as the varying levels of supposed legal protection conferred through an organisation’s Safe Harbour clauses [18, 19]. It follows that, depending on the programme policy, a hacker will be exposed to varying levels of risk. Note that a Safe Harbour clause can do little, if anything, to immediately protect a hacker from third-party legal action. Furthermore, an organisation might condition reward eligibility on the assignment or licensing of intellectual property (the report) by the hacker to the organisation. Whether or not these clauses are clearly outlined before or after the submission of a report may impact a hacker’s decision to participate.

While grandiose maximum payouts may be advertised, these are but an indication of the maximum amounts that could be rewarded. Organisations are not bound by these figures, nor are they bound to paying out any bounty upon the receipt of a valid vulnerability report. As highlighted by the experience of Miller [52] in negotiating the sale of software vulnerabilities through a broker, and underpinned by the work of Arrow [7] concerning information as a commodity, once a hacker reveals too much information about a vulnerability (e.g., when proving its existence), the information asymmetry is resolved, diminishing the value of the information offered by the hacker. Perhaps this highlights the importance of the current informal constraints that lead an organisation to reward a hacker in the case of a bug bounty; however, as noted in Section 2.1, there are formal constraints that may prevent an organisation from doing so. Additional complications may be raised by the proclivity of Web3-based organisations to award payouts in cryptocurrencies rather than via fiat currencies. These rewards appear to be, at times, slightly disingenuous as the bounties will be advertised in USD, yet the policy will specify that the payouts may be issued in an equivalent amount of cryptocurrency (to the discretion of the organisation). Therefore, to avert later complications, it is necessary for hackers to understand the institutions concerning the rewarding process.

There may also be benefits to organisations that want to better understand the current policies published by other organisations that operate CVD programmes. As explored by North [55], albeit in case studies unrelated to this topic, participants should adapt to the institutional change that occurs. In the context of CVD programmes, it may be beneficial for organisations to alter their policies over time as norms change. As such, the identification of current normative policies and rules allows laggards an opportunity to modify their programmes and ‘catch up’.

It follows that renewed investigation into the CVD policies put forward by organisations, across a breadth of platforms and sources, would have the potential to provide both hackers and organisations a greater understanding of the formal and informal constraints contained within. To help answer our research question What information do organisations convey to hackers through public policy documents?, we consider the following subsidiary questions:

(1)

What are the formal constraints communicated to hackers within the policy, particularly within legal statements? How do these impact hackers?

(2)

What is the variation in the informal constraints that are communicated to hackers?

(3)

To what extent do organisations include institutional elements in their CVD programme policies?

Question 1 aims to identify the formal constraints imposed upon hackers, particularly those that are present in the relevant legal components as defined by Laszka et al. [42]. It also considers the various impacts these might have on the actions of hackers. In a similar vein, the ambition of Question 2 is to identify the variation in the informal constraints imposed upon hackers, helping to link the taxonomy of Laszka et al. [42] to institutional economic theory [41, 55]. Question 3 seeks to comment on the policies as a whole, using insights garnered from answering the previous questions, to better understand the current state of CVD policies across multiple platforms and stand-alone programmes.

3 Methodology

We discuss the study’s methodology in this section. Considerations for the data underpinning the study are presented in Section 3.1. This includes details concerning the sourcing of the data from bug bounty platforms and stand-alone programmes, the coverage of the data, and the extent to which data was collected. Section 3.2 introduces the annotation schemes used to categorise data in preparation for use in the creation of supervised Natural Language Processing (NLP) models. Explosion AI’s Prodigy annotation tool was used by the lead author to create annotated datasets for both Named Entity Recognition (NER) and text-classification downstream tasks. The NLP models, alongside a collection of more traditional pattern matching models, are discussed in Section 3.3 and provide the basis for the results presented throughout the remainder of the article. A summary of the end-to-end data pipeline used for document processing is presented in Section 3.4, and includes a description of the inference and extraction steps necessary to analyse the CVD policy documents. A discussion of the factors affecting the replicability and reproducibility of the study is discussed in Appendix A, and further details can be found in the GitHub repository for the study.⁴

3.1 Data Collection

Unlike previous work from a limited number of sources, particularly from prominent bug bounty platforms (e.g., HackerOne and Bugcrowd), data for this study was collected from 13 bug bounty platforms and a wide selection of independent (stand-alone) bug bounty and responsible disclosure programmes. This included a selection of former programmes that have been shut down or are otherwise inactive. Platform were selected using online repositories, such as the ‘Open-Sourced Collection of Bug Bounty Platforms’,⁵ with the added criteria that some (at least one) programme information be public, content was available in English, and the platform website was available at the time of data collection.

In total, the public policy documents of 1,243 programmes were collected from the aforementioned bug bounty platforms. As commented by Christian [14], bug bounty platforms are host to a significant number of organisations that choose to operate private, invite-only programmes, that cannot be accessed or viewed without prior arrangement. As such, the policies for these private programmes could not be collected. An additional 3,390 organisations thought to operate a stand-alone CVD programme were investigated, and any relevant data downloaded. After filtering out programmes that did not have any viewable policy information, 555 stand-alone CVD programmes remained.

A summary of the data sources is provided in Table 1. The U.S.-based platform HackerOne⁶ is the single largest unified source of data, and, along with Bugcrowd⁷ (which is also U.S. based), the platform is frequently used as a source of data for research pertaining to the use of BBPs [42, 91, 99, 101]. Unlike the other sources, the Immunefi⁸ and HackenProof⁹ platforms focus almost exclusively on Web3-based programme assets (e.g., smart contracts). The list of independent programmes was curated from publicly available repositories¹⁰ of organisations known to have—or thought to have—accepted vulnerability disclosures.

Table 1.

Source	Programmes	Country of Registration
HackerOne	526	U.S.A.
Immunefi	280	Singapore
Bugcrowd	224	U.S.A
Intigriti	76	Belgium
HackenProof	53	Estonia
YesWeHack	30	France
BugBounty.jp	20	Japan
BugBase	10	India
WhiteHub	8	Vietnam
Security@me	7	Australia
RedStorm	6	Indonesia
Crowdswarm	2	U.A.E.
Huntr	1	U.K.
Independent	3,390*	Global

Table 1. Summary of the Sources Used to Identify and Collect Public CVD Policy Documents, as of 16 April 2022

*Note that only 555 stand-alone, independently hosted programmes of the 3,390 collected contained useful data. Unfortunately, the number of private programmes that could not be collect is not known, as these are typically hidden from public view (the lack of private programme information available has previously been noted as a barrier to research [14]).

For each programme, all relevant data in plaintext or markdown format pertaining to the organisation’s CVD programme was collected in early April 2022. Pertinent data presented in tabular formats was also collected using custom Selenium or Beautiful Soup Python scripts. Where possible, raw markdown data was preferred over rendered markdown text owing to the greater flexibility allowed for in later processing steps. The data collected included the following:

•

General programme statistics: Programmes hosted on bug bounty platforms typically display statistics that reference certain operational characteristics. For example, statistics relating to the response efficiency of the organisation (average time to respond, time to bounty, etc.) may be useful for hackers to better understand the interactions they may have after the submission of a report. In addition, figures pertaining to the quantity of reports submitted and resolved, alongside the corresponding bounty payout totals, give an indication of the previous activity of hackers in relation to the programme [91]. Although the general statistics are not the focus of the study herein, the data was collected and made accessible to researchers via a GitHub repository to support wider research.¹¹

•

Policy documents: For all programmes identified in Table 1, the published publicly viewable policy documents (the content of which varies widely) were collected. CVD programme policy documents may contain, but are not limited to, company mission statements, programme descriptions, scope definitions, potential rewards, guidelines, rules, legal statements, submission criteria, and submission instructions. An organisation may choose to include some, or none, of the aforementioned policy components, as well as any other information that is believed to be relevant to the operation of the CVD programme. Although there may be a notion of standardisation in terms of policy content and format when viewing documents across a single platform, there can be significant variability across platforms or independent programmes.

•

Scope tables: Formally defined scope tables detail the in-scope and out-of-scope assets, highlighting areas in which a hacker is permitted to search for vulnerabilities, and those from which they are forbidden. For programmes on the HackerOne platform, these were collected separately from the policy documents.

•

Bounty tables: The value of the bounty paid out for a given vulnerability report may be conditioned on several factors specified by an organisation through the use of a bounty table. Often, organisations will directly link the bounty to the CVSS score assigned to the vulnerability underlying a report. However, certain classes of vulnerabilities, such as remote code execution, may hold more importance to the organisation when identified in certain assets. As such, organisations may instead use a detailed series of bounty tables to convey the payouts awarded to specific vulnerabilities. For programmes on the HackerOne platform, these were collected separately from the policy documents.

•

Programme updates: Certain bug bounty platforms allow programme operators to publish update messages separately from their main policy documents. Typically, these updates contain information that is more time sensitive than programme information conveyed elsewhere. As such, these updates enable increased visibility for information pertaining to limited time events and promotions, notice of noteworthy programme changes (e.g., termination of a programme), and opportunities for hackers to provide feedback to the operators through linked surveys. For programmes on the HackerOne platform, these were collected separately from the policy documents.

•

Document version histories: Both the HackerOne and Intigriti platforms offer the full version histories for the content displayed on programme pages by the way of markdown documents with changes displayed in the unidiff format. This allows for full transparency of the incremental changes made by the operators over a programme’s lifetime. Only the HackerOne version histories, in markdown format, were collected.

A summary of the data collected is shown in Table 2. Following the collection of the data, all information and associated metadata was stored in various SQL database tables for retrieval in later stages.

Table 2.

Type	Count
General	526
Policy	4,633
Policy changes	68,693
Scope table entries	6,576
Scope table changes	14,479
Bounty table entries	379
Bounty table changes	7,030
Updates	601

Table 2. Summary of the Types of Information Collected from the CVD Programmes, from the Sources Listed in Table 1, as of 16 April 2022

Where possible, data was collected in markdown format to help preserve the structure of the data. Changes to policies, scopes, and bounties were commonly collected in the unidiff format.

3.2 Annotated Dataset Curation

To allow for the training of supervised Machine Learning (ML) models that are capable of utilising semantic features in the text, both paragraph-level multi-category labelling and sentence-level entity labelling were employed. This resulted in the creation of more than 12,000 unique hand-labelled sentences (representing 9.7% of the 123,440 sentence corpus) and more than 3,000 hand-labelled paragraphs (representing 7.6% of the 39,748 paragraph corpus). The publication of these labour-intensive annotated datasets helps enable further research reliant on large annotated corpora.

The annotation process was guided by Bloomberg’s guide covering ‘Best Practices for Managing Data Annotation Projects’ [87] and served as a background to annotation guideline creation. Paragraphs and sentences (paragraphs split using the NLTK sentence tokenizer¹²) were randomly selected across the entire corpus for annotation within their respective tasks. The annotation itself was undertaken by a single coder (the first author). As such, the quality of the annotation datasets may be limited due to authorship biases.

A summary of the categories and entities in the annotated datasets are shown in Tables 3 and 4, respectively. Data for annotation was selected with uniform probability across a de-duplicated dataset from all data sources and includes all content types. The annotation work was carried out using the Prodigy¹³ annotation tool developed by Explosion AI using a research licence.

Table 3.

Category	Count
COMPANY-STATEMENT	662
REWARD-EVALUATION	466
SCOPE-IN	284
VULN-ELIGIBLE	219
VULN-INELIGIBLE	211
PROHIBITED-ACTIONS	182
GUIDELINES-SUBMISSIONS	174
SCOPE-OUT	155
ENGAGEMENT	119
GUIDELINES-DISCLOSURE	77
LEGAL	71
PARTICIPANT-RESTRICTIONS	38

Table 3. Summary of the Categories of the 2,618 Labels Assigned to the 3,003 Annotated Paragraphs

Note that with the multi-classification approach, a single paragraph may be assigned multiple labels if it contains components pertaining to multiple categories. A paragraph may be assigned no label if it does not conform to the pre-defined categories. In the context of the taxonomy, 955 of the 3,003 randomly selected paragraphs were not assigned a label as the information was not relevant (e.g., markdown headers, markdown used for formatting without content, and conversational or seasonal messages such as ‘Happy New Year, Hackers!’), could not be assigned an appropriate label due to lack of surrounding context (e.g., lone URLs), or the text was non-English.

Table 4.

Type	Count
ORG	3,692
ASSET	3,663
VULN	3,374
STKHLDR	1,208
TECH	1,030
PRODUCT	662
SYMBOL	500
LAW	467
PI	454
SEVERITY	450
DATE	378
GPE	139
PROMO	132
PERSON	61
IP	55

Table 4. Summary of the Entity Types of the 16,262 Entities Annotated Within the 12,526 Sentences

Annotations are represented in the dataset by a non-overlapping character span defining the start and end of an entity, and a label corresponding to the entity type. Sentences will contain zero or more entities, and can contain multiple entities of the same type. Those with zero entities include mined hard-negative samples to help improve the robustness of downstream models [45].

As described in Section 2, Laszka et al. [42] conducted policy analysis on 111 BBPs hosted on the HackerOne platform. Within the qualitative component of the work, the authors develop a taxonomy of 11 policy components that organisations use to convey information to hackers about the programme. The components within the taxonomy created by Laszka et al. [42] form the basis for the categories used to annotate the policy documents within this study, which are given in the following. (Shown in brackets are the actual labels used in the annotations.) Many of the elements of the taxonomy are highlighted by Akgul et al. [4] as being considered important to hackers (e.g., communication of the scope and payment rules). Examples of the taxonomy elements can be found in Appendix B. The taxonomy components are as follows:

•

In-scope areas (SCOPE-IN): Paragraphs or statements that include an explicit definition of in-scope areas or assets that are considered to be part of the CVD programme.

•

Out-of-scope areas (SCOPE-OUT): Content that includes the explicit definition of out-of-scope areas or assets that a hacker is forbidden to explore for the purpose of security research.

•

Eligible vulnerabilities (VULN-ELIGIBLE): Definition of specific vulnerability types or classes that an organisation considers to be eligible for submission.

•

Non-eligible vulnerabilities (VULN-INELIGIBLE): Specific vulnerability types or classes that are ineligible for submission. Typically, this includes vulnerabilities considered to be of low or no severity to the organisation, omission of best practices, or non-technical vulnerabilities (e.g., phishing of employees or customers). For example, cross-site request forgery, vulnerabilities requiring outdated or otherwise unsupported web browsers, username or email enumeration, and the discovery of descriptive error messages are often marked as ineligible.

•

Deepening engagement with organisations (ENGAGEMENT): As defined by Laszka et al. [42], this category of statement includes instructions to hackers as to how they can better ‘engage’ in the discovery of security vulnerabilities. This can include directions from the organisation to the hacker on how to set up dedicated test accounts, use specific testing domains, and configure user accounts to access hidden development/ testing features.

•

Prohibited or unwanted actions (PROHIBITED-ACTIONS): A hacker can take several actions during the process of vulnerability discovery that may be considered undesirable to the organisation. These include somewhat benign violations such as causing undue load on production servers resulting from the use of automated vulnerability scanning tools. Organisations may also explicitly state violations that they consider to be malign, or are otherwise illegal, such as denial of service attacks, accessing user or customer data (particularly if it contains personally identifiable information), or attacking client systems.

•

Legal clauses (LEGAL): Paragraphs or statements put forward by the organisation as part of the CVD policy may include several legal clauses relevant to the process of vulnerability discovery and the submission of subsequent reports. These include, but are not limited to, exemption from the DMCA [83] during the discovery process, authorisation of research activity in accordance with the CFAA [81], and assignment of intellectual property rights upon the submission of a report.

•

Participation restrictions (PARTICIPANT-RESTRICTIONS): Organisations commonly restrict previous or current employees, contractors, and their immediate family members from participating in an organisation’s CVD programme and claiming any subsequent rewards to prevent misuse of an individual’s privileged knowledge of, or access to, company or client systems. Furthermore, as many organisations are subjected to the laws of the United States, they are prohibited from awarding payouts to any individual currently located in, or ordinarily a resident of, any sanctioned country [86], and as such, they may prohibit participants from certain geographies. For U.S.-based organisations, this includes any hackers residing in entities in Country Group E (e.g., Cuba, Iran, and North Korea).

•

Submission guidelines (GUIDELINES-SUBMISSIONS): Specific guidelines about the information required, or desired, to be included in vulnerability reports may be put forward by organisations. This also includes any specific instructions on how to submit either the vulnerability report or accompanying material (e.g., video recordings).

•

Disclosure guidelines (GUIDELINES-DISCLOSURE): The programme policy may contain further information concerning the possibility of disclosure by the hacker, together with guidelines that a hacker is asked to follow before disclosing any details (e.g., waiting a period of 90 days following report submission).

•

Reward evaluation (REWARD-EVALUATION): If it is the case that separate bounty tables are not used by an organisation, any information pertaining to the potential rewards offered following the successful submission of a vulnerability will be outlined within the policy document. This includes paragraphs or statements that detail the value of a report in terms of CVSS score or vulnerability class. This includes any additional conditions that will materially alter the payout, such as increased bounty payouts from high-quality reports, rewards offered to the second discoverer of a vulnerability (typically diminished or non-monetary), and bonuses awarded to repeat discoverers.

•

Company statements (COMPANY-STATEMENT): Any paragraph or statement that includes general information about an organisation’s posture towards security or the CVD programme in question.

To gain a deeper understand of the content included within policy documents, it is necessary to also explore the more fine-grained semantic features that are present. As such, it is useful to annotate the entities that exist within the content for later use in an NER model. Similar to the qualitative approach employed by Laszka et al. [42], we attempt to capture the generic components (entities) that form a policy document—with this being done at a sentence level. We do not focus on capturing security-related elements; rather, we identify more general information typically targeted by NER systems, allowing for broader insights to be drawn from the analysis. To this end, the following 15 unique entity types are defined:

•

Organisation name (ORG): Any reference to an organisation, not necessarily the organisation responsible for the CVD programme. This is an entity type commonly included in NER datasets.

•

Asset (ASSET): References to any assets owned by an organisation or other third-party stakeholder. This includes assets that are both in- and out-of-scope to hackers.

•

Vulnerability or attack class (VULN): Specific vulnerabilities or attacks that are mentioned within by an organisation in any context.

•

Stakeholder (STKHLDR): Any party that could be considered a stakeholder to the organisation, or a stakeholder of a third party. Examples include users, customers, employees, and contractors.

•

Technical reference or acronym (TECH): Any reference to any non-branded technology, term, or related acronym that is not inherently a security vulnerability or attack class. Examples include TLS, SSL, automated tools, and jailbreaks.

•

Product reference or acronym (PRODUCT): References to a specific product or service produced by an organisation. Examples include iOS, Android, and Windows.

•

Currency symbol (SYMBOL): To distinguish currency symbols from other acronyms or entities of interest, it is necessary to identify any symbols as a separate entity. Due to the prevalence of Web3-related programmes in the collected data, it is particularly useful to be able to correctly identify the plethora of cryptocurrency symbols that are present throughout. For programmes focused on protecting Web3-based assets, it is common for them to offer rewards in a specified cryptocurrency rather than USD equivalent. For example, payouts may be issued in USDT (Tether stablecoin) or LUNA (the now defunct Terra stablecoin).

•

Law or legal term (LAW): Any legal clauses or terms (aside from those relating to intellectual property and proprietary information, see the IP type) that are present in a given sentence. This includes references to specific laws and regulations (e.g., DMCA, General Data Protection Regulation, and CFAA), and other legalese (e.g., good faith).

•

Personal information (PI): References to any information that may be consider personal or personally identifiable information such as phone numbers, email addresses, credit card numbers, and physical addresses. Note that this entity does not refer to the underlying information but rather references to the types of information that are mentioned in the data. For example, the statement “please refrain from accessing or retaining any customer credit card or personal contact information” contains “credit card” and “personal contact information” as entities of the type PI.

•

Severity rating (SEVERITY): Any reference to the severity of a vulnerability.

•

Date (DATE): Dates or time periods mentioned within a sentence. This is an entity type commonly included in NER datasets.

•

Geopolitical entity (GPE): The identification of any geopolitical entities is useful for further analysis into mentions of legal jurisdictions (at either the country or state level), sanctioned countries, locations of registration, and laws specific to certain countries or other GPEs (e.g., the European Union). This is an entity type commonly included in NER datasets.

•

Promotional event (PROMO): Some organisations will routinely encourage hackers to search for vulnerabilities in recently launched products or services through the use of promotional events, and will often include bonuses on payouts for any vulnerabilities successfully discovered and reported. Furthermore, in an attempt to increase the interest of hackers in a programme (perhaps after the operators notice a drop in the perceived motivation of hackers), some organisations will use limited time events with increased bounties or special rewards to encourage hackers back. As such, it is of interest to identify any references to these promotional events.

•

Person (PERSON): Organisations may include the names of the operators, or references to successful hackers (either by name or alias) in the policy document or published update. This is an entity type commonly included in NER datasets.

•

Intellectual property and proprietary information (IP): References to any terms relating to intellectual property that exist within a sentence. For example, hackers may be required to grant irrevocable and unlimited licenses of their work (the vulnerability reports) to an organisation in order to be eligible for a payout. Although separate from intellectual property law, this type of entity also includes related references to proprietary information and trade secrets.

3.3 Modelling Approach

Three distinct modelling approaches were used to extract textual information from the policy documents. As described in Section 3.3.1, a supervised paragraph-level classifier was used to categorise the structural elements of a document. Two separate approaches were used in conjunction to extract sentence-level features from each document. As such, Section 3.3.2 describes the use of a deep learning based NER system to detect pre-defined entity types, and Section 3.3.3 outlines the use of traditional pattern-matching techniques to detect consistent entities.

3.3.1 Text Classification.

To classify a given piece of text into the pre-defined categories (as described in Section 3.2), a multi-label textual classification model was constructed using the spaCy¹⁴ NLP library [69]. A multi-label model is needed to accommodate the possibility of a piece of text simultaneously belonging to multiple categories. As the annotated corpus is formed of paragraphs, it is reasonable to expect that a paragraph may contain information pertinent to one, several, or none of the categories defined by Laszka et al. [42] in their taxonomy of policy components. For the sake of brevity, the architecture and hyper-parameter search for the text classification model are not discussed in depth. However, further details are provided alongside the model in the project repository.¹⁵ In essence, a standard spaCy pipeline with an additional ‘textcat_multilabel’ component was used for the model.

For training and testing, fivefold cross validation was performed using the annotated data. An average macro F-1 score of 0.589 was achieved. (The per-class results are shown in Table 5.) At present, a considerable class imbalance exists within the data that may impact the performance of classifiers trained over the dataset. Future work may help address this imbalance and alleviate the poor performance seen within some classes through the annotation of additional ‘hard’ positive examples in under-represented categories or through more extensive modifications to the model architecture [45].

Table 5.

Category	F-1
COMPANY-STATEMENT	74.0
REWARD-EVALUATION	78.6
SCOPE-IN	55.2
VULN-ELIGIBLE	59.7
VULN-INELIGIBLE	60.7
PROHIBITED-ACTIONS	38.3
GUIDELINES-SUBMISSIONS	75.4
SCOPE-OUT	31.6
ENGAGEMENT	50.0
GUIDELINES-DISCLOSURE	58.1
LEGAL	84.6
PARTICIPANT-RESTRICTIONS	40.0

Table 5. Per-class Fivefold Cross-Validation Metrics for the Multi-Label Text Classification Models Using the Annotated Datasets

Macro-averaged F-1 scores are reported in the table (averaged across the five folds).

For a given passage of text, the output of the model is in the form

\begin{equation} \boldsymbol {\hat{Y}} = \left[\hat{y}_1,\hat{y}_2,\ldots ,\hat{y}_N \right]^T \end{equation}

(1)

such that

\[\begin{eqnarray*} \hat{y}_i \in [0,1] \quad \forall i \in \lbrace 1,\ldots ,N\rbrace . \end{eqnarray*}\]

Here, $\boldsymbol {\hat{Y}}$ is the output vector containing elements $\hat{y}_i$, corresponding to the estimated probability that the input belongs to the $i^{th}$ class for each of the N classes. Note that the individual class probabilities are not normalised across all classes, and as such,

\begin{equation} \sum _{N}^{}\hat{y}_i \in [0, N]. \end{equation}

(2)

3.3.2 Named Entity Recognition.

An NER model was trained on the annotated sentences to detect the entity types described in Section 3.2. Unlike traditional pattern matching techniques (e.g., through the use of gazetteers or handcrafted features [53, 54]), the NER models employed can leverage large pre-trained transformer [89] (see BERT [17] and RoBERTa [47]) and non-transformer (CNN and LSTMs [97]) deep learning based NLP architectures to identify entities within sequences of word, wordpiece, or character-level embeddings [11]. The use of contextual embeddings [50] further benefits the NER models [44], improving upon their ability to generalise and detect out-of-sample entities when compared to traditional techniques [37]. As with the classification model outlined in Section 3.3.1, the spaCy NLP library is also used to construct the NER model. As before, the architectural details of the NER model are omitted for the article; however, further details are provided alongside the model in the project repository.¹⁶ In essence, a standard spaCy pipeline with an additional ‘NER’ component was used for the model.

For training and testing, fivefold cross validation was performed using the annotated data. The per-type results are shown in Table 6. As is typical in NER tasks, there is a dearth of entities in comparison to negative samples (tokens that do not belong to any of the pre-defined types) in the dataset. Little can be done to resolve the resulting imbalance between entities and non-entities that exists within the data. However, selective annotation may be used to help address the imbalance that exists between entity types by selecting ‘hard’ samples containing low-count entities for further human annotation.

Table 6.

Type	F-1
ORG	85.7
ASSET	78.7
VULN	75.1
STKHLDR	80.5
TECH	74.1
PRODUCT	67.5
SYMBOL	81.3
LAW	78.2
PI	76.8
SEVERITY	86.2
DATE	78.9
GPE	72.8
PROMO	55.7
PERSON	23.2
IP	74.5

Table 6. Per-Type Fivefold Cross-Validation Metrics for the NER Models Using the Annotated Datasets

Macro-averaged F-1 scores are reported in the table (averaged across the five folds). Evaluation is performed using an exact match criteria for both the type and span (the predicted entity spans must align exactly with the ground truth spans).

3.3.3 Pattern Recognition.

For entities that are consistent in their format, the use of regular expressions for pattern recognition is preferred due to the time saved in comparison to developing supervised models and annotating data. In the datasets collected for this study, there are several entities recognised using regex. These are listed next:

•

Email addresses: Occasionally, organisations will provide an email address by which hackers are able to disclose the details of a vulnerability, or to be used as a line of communication between hackers and the operators.

•

URLs: Any domain prepended with http, ftp, https, or www is identified as part of the URL extraction process.

•

Domains: Any remaining domain-like objects after URL extraction are detected using a further domain regex pattern that relaxes the conditions on the prepended information.

•

IPv4 addresses: Organisations may specify IP addresses within their policy documents or scope tables. Due to time constraints, only an IPv4 regex was implemented. (However, it was found that the number of IPv6 addresses in the data was small.)

•

Bounty values and ranges: Organisations will common publish bounty values, or a range of values, that may be paid out after the successful submission of a vulnerability report. For a range of currencies (USD, British pound, Euro, and Yen), these values and ranges are identified.

•

Markdown tables rows: For organisations that publish documents in markdown format, data may be contained within markdown tables. As the first stage of table information extraction, each row and the corresponding data are identified.

•

Markdown tables: Groups of continuous markdown table rows are identified as part of the second stage of the markdown table extraction.

3.4 Document Pipeline

The overall document processing pipeline is displayed in Figure 1 and can be applied to all of the collected datasets, including data in markdown and unidiff formats. The process is as follows:

Fig. 1.

(1)

Cleaning: Basic cleaning is applied to the input text, including the removal and separate storage of any unidiff marks, and removal of undesirable unicode characters.

(2)

Paragraph splitting: Documents are split over double newline characters and stored as a list of paragraphs to be individually processed.

(3)

Classifier: Each paragraph is classified using the model outlined in Section 3.3.1 and the output vector stored in the document knowledge base.

(4)

Sentence splitter: Each paragraph is further decomposed into sentences using the NLTK sentence tokenizer.

(5)

NER: The entity recognition model outlined in Section 3.3.2 is used to identify and extract 15 entity types before storage in the document knowledge base.

(6)

Pattern recognition: The regular expression-based pattern recognition techniques introduced in Section 3.3.3 are used to identify and extract a further seven types of entities before storage in the document knowledge base.

(7)

Document knowledge base: Information derived and extracted from an input document is stored in a knowledge base. This centralised repository created for each document is used throughout the analysis.

4 Results

In Section 4.1, results pertaining to an investigation of the formal constraints communicated to hackers within programme policies are presented. This includes further exploration of the numerous sources of constraints that a hacker may encounter during vulnerability research and disclosure. The variation that exists within the informal constraints communicated throughout programme policies is quantified and discussed in Section 4.2. This is presented alongside descriptive statistics on the policies and allows for a comparison to previous related work. Finally, the extent to which organisations include institutional elements in their CVD programme is discussed in Section 4.3. An overarching analysis of the elements included within policy documents allows for under-communicated areas to be identified.

4.1 Formal Constraints

As discussed in Section 2.2, institutions, as defined by North [55], are composed of formal and informal constraints. Pursuant to the first research question, which was outlined in Section 2.3, we explore the current usage of formal constraints in CVD policy. As discussed by Hodgson [34], to avoid imprecision, we restrict our definition of formal constraints to government-defined legal constraints [95].

The methodology outlined in Section 3.3.2 allows for the identification of formal constraints using the LAW and IP entity tags that may be assigned to sentences as part of the NER model. Furthermore, from paragraphs classified as containing legal clauses, analysis of the formal constraints promulgated is performed.

4.1.1 A Consideration of the Sources of Constraints.

A hacker can be subject to a litany of legal obligations, arising from a number of unilateral contracts, en route to participating in an organisation’s CVD programme. For the hacker, they must not only be aware of the rights and obligations outlined in each contract, but they must also understand the order of precedence in the event of conflict. For example, from Bugcrowd we have the following: “in the event of a conflict within the Legal Terms, the order of precedence shall be, in order of highest priority to lowest priority: any Bounty Brief, the Researcher Terms and Conditions, these Terms of Service, the Privacy Policy and then any other terms that comprise the Legal Terms”.¹⁷ When utilising a BBP hosted on a platform, hackers may be legally bound to or affected by any of the following contracts:

•

Platform General Terms and Conditions: Applicable to any individual that makes use of a platform’s websites.¹⁸

•

Platform User Terms and Conditions: Applicable to hackers that create, and make use of, a user account on a given platform.¹⁹

•

Platform Customer Terms and Conditions: Hackers may be affected by the contracts between the platform and a hosted organisation. For instance, on Bugcrowd, in the event of conflict, the Customer Terms and Conditions may supersede any ‘Terms of Use’ agreements from organisations that the hacker may be required to accept on the route to accessing the target systems.²⁰

•

Programme CVD policy: Dependent on the content of the policy, the programme policy may also act as a legally binding document between the hacker and an organisation.

•

Organisation Terms of Use/Service or EULAs: Access to, or use of, the target organisation’s assets (including websites, source code, software/hardware products, etc.) may also involve the hacker being in agreement to additional binding documents.

•

Non-disclosure agreements: Upon the invitation to a private BBP, or after the disclosure of a vulnerability report to an organisation, a hacker may be asked to sign a non-disclosure agreement.

Although each of the aforementioned contracts may contain terms that contribute to the formal constraints imposed upon a hacker, for the purpose of this work we only consider those that arise from the policy documents. However, the number of contracts involved serves as an indication as to the complexity of the legal situation that a hacker must navigate.

4.1.2 Generic Constraints.

Table 7 provides a breakdown of the inclusion of formal constraints by type of hosting method. As noted by Elazar [20] in a 2018 presentation, organisations will frequently make vague or generic references to the laws with which hackers must comply. It was found that there are 691 (non-unique) phrases associated with a generic reference to formal constraints mentioned within the policy documents across all organisations, with the most common being “applicable laws” (212), “State laws” (122), “applicable Federal, State, and local laws” (27), “local laws” (19), and “laws or regulations of any country” (8).

Table 7.

	Any		Generic		Specific
	#	%	#	%	#	%
Platform	351	28.2	268	21.6	243	19.6
Stand-alone	57	10.3	57	10.3	6	1.2
Bugcrowd	158	70.5	146	65.2	140	62.5
HackerOne	117	22.2	103	19.6	37	7.0
Immunefi	40	14.3	7	2.5	34	12.1
YesWeHack	29	96.7	7	23.3	29	96.7
Intigriti	5	6.6	3	4.0	3	4.0
HackenProof	2	3.8	2	3.8	0	0
RedStorm	2	33.3	2	33.3	0	0

Table 7. Number of Programmes, Grouped by Programme Type (Platform Based or Stand-alone) or by the Hosting Platform That Contain Either Generic or Specific Formal Constraints at Least Once in the Policy Documents

Percentages are reported relative to the number of programmes in a given category.

For stand-alone programmes, it was observed that there exists a tendency to make generic references to formal constraints. Of those specifying any formal constraints, 57 (100%) included some generic references, whereas only 6 (8.8%) included specific constraints. Across all programmes on platforms, of those that include formal constraints, 268 (74.7%) included generic references and 243 (67.7%) included specific references. Further differences in the proclivity to include formal constraints are explored in Section 4.3. Differences exist between the platforms, particularly between the largest non-Web3.0 specific platforms (HackerOne and Bugcrowd). Although the specification of generic constraints varies considerably, it is found that they are included on a majority of all Bugcrowd programmes.

4.1.3 Specific Constraints.

Table 7 also shows a breakdown of the usage of specific formal constraints used by organisations. The CFAA of 1986 [81] and the DMCA of 1998 [83] are the two most common specific constraints mentioned within CVD programme policy documents, with 279 and 274 respective mentions. Know Your Customer laws are mentioned 78 times within the documents. There are 16 references to EU regulation 2016/679, otherwise known as the General Data Protection Regulation [77]. There are also references to several laws specific to California: Section 502(c) of the California Penal Code [16, 75] (4), the California Consumer Privacy Act of 2018 [43, 59] (3), and the California Privacy Rights Act of 2020 [26] (1). Specific to the UK, the Computer Misuse Act 1990 [60] is mentioned once. Despite the global nature of CVD programmes, it is perhaps surprising to see references to specific laws and regulations predominantly confined to the United States and Europe. However, this may stem from the English language bias of the policies analysed (a further discussion of limitations is presented in Section 5.4).

As noted in Section 4.1.2, there exist significant differences in the inclusion of specific constraints between stand-alone programmes and those hosted on a bug bounty platform, with very few stand-alone programmes (1.2%) including references to specific formal constraints. The French platform YesWeHack is found to have the greatest proportion of programmes (96.7%) that include specific formal constraints.

4.2 Informal Constraints

Before discussing the variation in the institutional elements that organisations include within CVD programme policies, it is useful to briefly describe the characteristics of the policy documents in the context of the results presented by Laszka et al. [42].

Shown in Figure 2 is a comparison of the lengths of policies collected from bug bounty platforms to stand-alone programmes. From the policies collected, it is clear that those from platforms (average of 1,165 words) are typically far longer than those from stand-alone programmes (average of 370 words). In the analysis of HackerOne programmes by Laszka et al. [42], the authors found an average length of 481 words, a minimum of 72 words, and a maximum of 1,744 words. In comparison, focusing on only HackerOne programmes, it is found that the average length is 964 words, a minimum of 6 words (from the now defunct MixMax programme, “See policy and bounties at [URL]”, included URL redacted using the data pipeline), and a maximum of 4,442 words.

Fig. 2.

To continue the comparison with the work by Laszka et al. [42], Figure 3 shows the distribution of Flesch reading ease scores [24, 38] by policy length. The Flesch reading ease score is a popular heuristic that represents the readability of a passage of text using a 0 to 100 scale: a score closer to 100 indicates a simpler passage of text, and a score below 30 represents a difficulty appropriate for an individual with a university degree [24]. The average score for the policy documents is 35.32, which is lower than the average score of 39.6 reported by Laszka et al. [42]. Furthermore, 550 policies (approximately a third) have a score lower than 30, indicating a slight increase in the proportion of policies (up from 23.4% [42]) that can be considered somewhat difficult to read.

Fig. 3.

To quantify the variation in the informal constraints communicated to hackers, we consider the output of the multi-label classification model (as described in Section 3.3.1) as a measure of the presence of constraints (both informal and formal) that are contained within a given paragraph. Table 8 shows a count of the dominant labels (those of the highest likelihood) assigned to the paragraphs of all policy documents. A measure of the variance of the constraints contained within the paragraphs is calculated by dimensionally reducing the 12-dimensional output vector for all paragraphs to a 1-dimensional space using a combination of principal component analysis [1] and t-SNE [88]. The variance is then computed amongst all points within a given category as dictated by the dominant labels. The lower the variance for a given category, the more similar paragraphs (of the given category) are across all policy documents and the less likely the paragraphs are to communicate a variety of different types of constraint.

Table 8.

Dominant Label	Count	Variance	Count	Variance	Count	Variance
	All		Platform		Stand-alone
COMPANY-STATEMENT	7,368	1,470.7	5,742	1,455.1	1,626	1,526.2
REWARD-EVALUATION	3,225	2,602.8	2,829	2,526.6	396	2,523.6
SCOPE-IN	2,454	1,202.2	2,310	1,241.1	144	488.4
VULN-ELIGIBLE	1,372	1,061.7	1,278	1,052.8	94	1,126.0
VULN-INELIGIBLE	1,335	1,348.3	1,164	1,351.0	171	1,326.8
GUIDELINES-SUBMISSIONS	1,143	1,437.5	751	1,506.2	392	988.1
SCOPE-OUT	987	1,034.4	940	1,053.9	47	651.1
PROHIBITED-ACTIONS	937	1,714.3	741	1,730.0	196	1,644.7
ENGAGEMENT	799	440.3	728	442.1	71	405.7
GUIDELINES-DISCLOSURE	421	825.5	323	888.7 7	98	608.5
LEGAL	368	98.2	310	88.0	58	155.1
PARTICIPANT-RESTRICTIONS	105	419.7	89	415.1	16	364.0

Table 8. Counts of the Dominant Label Assigned to All Policy Paragraphs Using the Multi-Label Classifier and the Variance of the Dimensionally Reduced (Using t-SNE) Content Vectors

A lower variance for a given label indicates that policy paragraphs in that category are less likely to contain information pertaining to other labels and thus are more focused.

To provide hackers with clear and standardised policy documents, it is optimal to decrease the variance of the institutional elements communicated throughout. It was found that informal constraints communicated through REWARD-EVALUATION and PROHIBITED-ACTIONS have the most variance of all label categories. It was also found that the formal constraints communicated through LEGAL paragraphs show the least variance in the corpus, thus indicating that LEGAL paragraphs (as defined in Section 3.2 and by Laszka et al. [42]) are somewhat standardised and are less likely to contain a mixture of constraints.

Table 8 also allows for comparisons between the institutional elements found in the policies of stand-alone and platform-based programmes. Most notable is the difference in variation in the paragraphs concerning the communication of scope (SCOPE-IN and SCOPE-OUT) and submission guidelines (GUIDELINES-SUBMISSIONS). In the three categories, stand-alone programmes exhibit less variation, suggesting greater similarity across paragraphs. This may suggest that stand-alone programmes have less complex scope specifications and guidelines, and are therefore somewhat standardised in comparison to platforms based programmes.

The co-occurrence of all class labels with the dominant label for each paragraph, found using the output of the classification model, is used to represent the mix of topics discussed in a given policy paragraph. For example, of the 1,335 paragraphs that primarily convey information about the vulnerabilities that are ineligible (VULN-INELIGIBLE as the dominant label), there are 689 that also communicate out-of-scope assets or domains (SCOPE-OUT as a non-dominant label). Graphical representations of co-occurrence can been found in Appendix C. To aid comparison, the relationship between dominant labels and corresponding most common second labels is displayed in Table 9. Although there are many similarities between the two types of programme, the prevalence of COMPANY-STATEMENT as a second label for stand-alone programmes may suggest more generic policies.

Table 9.

Dominant Label	Second Label	%	Second Label	%
	Platform		Stand-alone
VULN-INELIGIBLE	SCOPE-OUT	51.5	SCOPE-OUT	52.0
COMPANY-STATEMENT	REWARD-EVALUATION	32.5	REWARD-EVALUATION	26.3
SCOPE-IN	COMPANY-STATEMENT	31.1	COMPANY-STATEMENT	38.9
LEGAL	COMPANY-STATEMENT	26.8	COMPANY-STATEMENT	41.4
ENGAGEMENT	COMPANY-STATEMENT	39.6	GUIDELINES-SUBMISSIONS	43.7
PARTICIPANT-RESTRICTIONS	COMPANY-STATEMENT	48.3	COMPANY-STATEMENT	50.0
REWARD-EVALUATION	SCOPE-IN	28.7	COMPANY-STATEMENT	47.5
PROHIBITED-ACTIONS	VULN-INELIGIBLE	47.2	VULN-INELIGIBLE	46.9
GUIDELINES-SUBMISSIONS	REWARD-EVALUATION	49.3	COMPANY-STATEMENT	54.3
SCOPE-OUT	SCOPE-IN	52.9	VULN-INELIGIBLE	68.1
GUIDELINES-DISCLOSURE	COMPANY-STATEMENT	48.0	COMPANY-STATEMENT	54.1
VULN-ELIGIBLE	SCOPE-IN	34.8	REWARD-EVALUATION	44.7

Table 9. Relationship Between Each Dominant Institutional Element and the Second Most Prevalent Element Within Each Paragraph for Both Platform-Based and Stand-alone Programmes

Percentages correspond to the proportion of dominant-label paragraphs that contain the ‘second label’ as the second most prevalent label. For example, for platform-based programmes, 51.5% of VULN-INELIGIBLE paragraphs have SCOPE-OUT assigned as the second most prevalent label. Graphical representations of the full co-occurrences can be seen in Appendix C.

4.3 Institutional Analysis

The results in Table 10 show the proportion of policy documents that contain each of the institutional elements of interest. Almost all organisations (98.7%) include general information about their security posture or CVD programme (COMPANY-STATEMENT) within the programme policy document. A majority of organisations will include information about the assets that are considered to be in-scope (74.2%) and out-of-scope (65.8%); however, this does not account for organisations that decide to provide scope details in a separate scope table. Perhaps of the greatest concern, a minority of organisations provide details of legal considerations (38.6%) and participant restrictions (31.7%) within the policy, requiring hackers to search elsewhere for information pertaining to such details (or otherwise remain in the dark).

Table 10.

Institutional Element	#	%	#	%	#	%
	All		Platform		Stand-alone
COMPANY-STATEMENT	1,724	98.7	1,139	99.2	585	97.7
REWARD-EVALUATION	1,447	82.8	1,041	90.7	406	69.5
GUIDELINES-SUBMISSIONS	1,335	76.4	918	80.0	417	67.8
VULN-INELIGIBLE	1,314	75.2	1,008	87.8	306	51.1
SCOPE-IN	1,297	74.2	996	86.8	301	50.3
SCOPE-OUT	1,150	65.8	918	80.0	232	38.7
PROHIBITED-ACTIONS	1,120	64.1	858	74.7	262	43.7
VULN-ELIGIBLE	1,102	63.1	874	76.1	228	38.1
GUIDELINES-DISCLOSURE	991	56.7	698	60.8	293	48.9
ENGAGEMENT	965	55.2	709	61.8	256	42.7
LEGAL	675	38.6	538	46.9	137	22.9
PARTICIPANT-RESTRICTIONS	554	31.7	477	41.6	77	12.9

Table 10. Proportion of Policy Documents, Either Across the Entire Corpus or Grouped by Programme Type (Platform-Based or Stand-alone) That Contain at Least One Example of a Given Institutional Element Amongst the Constituent Paragraphs

Percentages are reported relative to the number of programmes in a given category.

Across all institutional elements, a higher proportion of platform-based programmes contained a given element in comparison to stand-alone programmes, thus further demonstrating the disparities between the two in the context of the information contained within policy documents. Perhaps of particular concern are the differences between the proportion of programmes that include information on in-scope assets (86.8% versus 50.3%), out-of-scope assets (80.0% versus 38.7%), and prohibited actions (74.7% versus 43.7%). Ambiguity can arise without clearly defined boundaries and may lead to hackers accidentally performing undesirable actions [92].

5 Discussion

A discussion of the constraints that are imposed upon hackers that participate in CVD programmes is presented in Section 5.1 (formal constraints) and Section 5.2 (informal constraints), outlining many of the shortcomings that are commonplace within the sections of CVD policy documents. This is complemented by further discussion in Section 5.3 of the overarching gaps that afflict many of the policies collected as part of this study. Finally, in Section 5.4, we discuss the limitations of the study and potential drawbacks of the methodology. These limitations primarily stem from incomplete data coverage, annotation biases, and model under-performance.

5.1 Formal Constraints

It is clear from the results presented in Section 4.1 that hackers are presented with, and are subject to, a wealth of applicable legal agreements and language. As discussed by Elazari [20], the inclusion of generic formal constraints (e.g., “you must comply with all relevant local, state, national or international laws”) places a considerable burden on the hacker to understand, across a number of jurisdictions, the laws and regulations that they must navigate during the process of vulnerability discovery and disclosure. Furthermore, unresolved conflicts may exist within the myriad applicable contracts, potentially exposing hackers to civil of criminal liabilities [20].

The advent of cloud computing, and continued employment of the technology of the storage of personal data [2, 31], may also add further complexity to the process of lawful vulnerability research. Hackers may have difficulty identifying applicable laws if the locations of data centres are not made obvious [13]. Irrespective of the location of the hackers, organisations should endeavour to clearly communicate the applicable jurisdictions to those wishing to participate in a programme.

An earlier study by the National Telecommunications and Information Administration found that the threat of legal action impacted the disclosure decisions of 60% of surveyed hackers [80]. It is, perhaps, necessary to further simplify and standardise the legal language used within CVD policy documents to clarify the legal position afforded to hackers conducting legitimate, ‘good faith’ security research.

When considering the specific constraints referenced in policy documents, the CFAA and DMCA are most prevalent, indicating the relevance of U.S. federal and copyright laws to hackers across the world, and highlights the importance of providing authorisation to hackers under both acts in Safe Harbour clauses [21]. Previous research has highlighted the fear amongst hackers at the legal risk brought about by the CFAA and DMCA when conducting security research [76]. Zhao et al. [100] encourage a change in regulatory policies (e.g., the DMCA) to help protect those conducting legitimate security research.

As noted in Section 4.1.1, aside from the individual programme policies published by programme operators, the users of bug bounty platforms (both the customer organisations and the hackers) will be subject to the particular policies set forth by the platform operator. Depending on the nature of the services provided by the platform, these may supersede those of the programme. For example, the payments for bounties awarded by programme operators on the HackerOne platform are facilitated by HackerOne.²¹ As a U.S. company, they are unable to provide payments to any sanctioned individuals, or individuals normally resident in a country under U.S. sanction (see Section 2.2). In the case of issuing payments, the participant restrictions of the payment facilitator (e.g., HackerOne) will have precedence over individual programme policies, unless an organisation chooses to bypass the platform and issue payments by other means. This may cause an absence of payment restriction clauses for programmes hosted on U.S.-based platforms. The impact of sanctions is not limited to U.S.-based organisations. For example, sanctions imposed by the UK Department for International Trade (trade sanctions) and HM Treasury (financial sanctions) may impact the unrestricted flow of vulnerability information and payments by and between British organisations and hackers [78]. Although not investigated here, the ever-shifting geopolitical landscape may necessitate periodic updating of payment or participation restriction clauses in policy documents to ensure that correct and up-to-date information is communicated to hackers.

Certain platform policies may also exist alongside those published by the programme operator. Both HackerOne and Bugcrowd publish hacker ‘codes of conduct’²² that outline acceptable behaviour when participating in programmes and communicating with other parties on the platform. Failure to comply with the code may result in punishment, yet the severity of the platform punishments in many cases appears incongruous with the potential legal ramifications following investigation by the programme operator. For example, it may take four violations of ‘service degradation/unsafe testing’ on HackerOne before an account is permanently banned. Hackers should be aware of the policies that exist on bug bounty platforms, and future research should endeavour to consider the interplay between platform and programme policies.

When considering the specification of formal constraints, there is a clear disparity between platform-based and stand-alone programmes (see Table 7), with platform-based programmes more likely to include generic and specific references to formal constraints. As reported by Walshe and Simpson [92], the operators of BBPs note that the operation of a programme on a platform often represents a more mature security activity than that of many stand-alone programmes. The relative immaturity of many stand-alone programmes in comparison to platform-based programmes may explain, in part, the lack of attention placed on including constraints in a programme’s policy. However, this does not explain the large variation between platforms.

5.2 Informal Constraints

A study of information security policies of the top 200 U.S. universities (of which 54% had a publicly accessible policy) by Weidman and Grossklags [94] considered, among other factors, the readability and length of policy documents. Interestingly, they find that the readability of information security policies is far less than the policies found in CVD programmes (average Flesch reading ease score of 12.54 versus an average of 35.3). They note that a score of 0 to 30 is suitable for university graduates and therefore may be difficult to parse for the individuals at a university without a degree. A score of 30 to 50 is suitable for university students (non-graduates). However, as noted by Akgul et al. [4], many hackers have not reached this level of educational attainment. Furthermore, many hackers do not have English as a first language [29], perhaps adding to the difficulty of understanding the information contained within CVD policies. The commonality in relative readability (given the target audience), or lack thereof, of university information security policies and CVD programme policies suggests that the communication of technical policies may be ill suited for the audience.

Analysing the institutional elements reveals that certain informal constraints have significant variation between programmes. Although this may be expected of scope definitions, company statements and specific access instructions (e.g., those in the ENGAGEMENT category)—all of which will vary due to the peculiarities of a particular programme and operating organisation—a lack of commonality in the requirements for rewards, instructions for report submission, and methods by which reports are assessed may lead to hackers missing the nuances of a particular programme.

5.3 Institutional Analysis

The inclusion of specific institutional elements varies considerably between programmes; as noted in Section 4.3, the difference is particularly evident between stand-alone programmes and those hosted on a bug bounty platform (see Table 10). Although policy documents are used to help hackers understand the constraints that govern their actions, many organisations fail to explicitly define the gamut of formal and informal constraints with which they wish hackers to comply. The lack of information available to hackers, or lack of clarity within the information provided, may result in the submission of sub-optimal vulnerability reports, exacerbating the issue of low-quality reports facing many organisations [5, 92].

Furthermore, when considered a legally binding contract, the absence of certain information, or ambiguity of important information, may introduce additional risks and liabilities for organisations and hackers alike [67]. For instance, the absence of sufficient Safe Harbour clauses weakens the enforceability of the contract [23] while also exposing hackers to risks due to a lack of authorisation.

Throughout the results presented in this study, a common theme is that of a lack of information communicated via the policies of stand-alone programmes in comparison to those belonging to programmes hosted on bug bounty platforms. Previous qualitative work has focused on exploring the challenges faced by the operators of both programme types, revealing significant differences in operating characteristics. Through the use of surveys and interviews, Al-Banna et al. [5], and later Walshe and Simpson [92], reveal that organisations may choose to make use of a bug bounty platform, such as HackerOne, where there is not the required in-house knowledge to set up and run a CVD programme. Instead, individuals from the platform assist the organisation in setting scopes, setting bounties, creating policies, and managing the triage of incoming reports, and will use similar organisations or existing norms as a point of reference [92]. Having the assistance of the platform operators may explain, in part, the significant differences between platform-based and stand-alone programmes, as organisations on a platform have guidance far more readily available. It may also account for the lower variance exhibited across certain institutional elements, as certain types of information may be more homogeneous across the same bug bounty platform.

Furthermore, it is reported by some programme operators that the increased visibility that comes with a platform can lead to unmanageable volumes of incoming reports, many of which are of low or no value [92, 100]. The employment of more complete and robust policies can help ease the burden on the operators [92], and may explain the higher prevalence of information pertaining to out-of-scope assets, prohibited actions, and ineligible vulnerabilities in the policies of platform-based programmes.

5.4 Limitations

There are several limitations to the study presented in this article, which we document here.

First, although a wide sample of programme policies were sought for analysis—drawn from stand-alone programmes and those appearing on a wide range of bug bounty platforms—the coverage is incomplete. This will be particularly problematic in the case of stand-alone programmes, for which only 555 policies could be obtained. As such, the results may not be reflective of the wider body of programme policies.

Second, the quality of the annotated data may be limited by the annotator’s understanding of the underlying data and their attentiveness to the presence of entities or ability to determine the most appropriate categorical label for a passage of text. It should be recognised that the performance of models trained over data that is incorrectly annotated, or contains incorrect samples, may be degraded [96].

Third, as the analysis is reliant on the performance of the NER and text multi-classification deep-learning models, poor model performance may lead to incomplete or incorrect conclusions being drawn. As shown in Tables 5 and 6, the results for both the classification and NER models exhibit significant variability between classes and entities types. Certain types, such as SCOPE-OUT (classification) and PERSON (NER), perform particularly poorly. This is, in part, due to the low number of annotations and the inability of the models to generalise fully across all types. Future research making use of the models published as part of the study should consider the limitations of their performance, and perhaps attempt to improve performance through the annotation of additional samples or architectural refinement (e.g., using a pre-trained transformer).

6 Conclusion

Motivated by the question What information do organisations convey to hackers through public CVD policy documents?, we have described the use of deep learning models to assist in the understanding of CVD programme policy documents in the context of institutional economic theory. Analysis of thousands of policies, collected from stand-alone programmes and programmes hosted on 13 bug bounty platforms, reveals significant variability in policy content and gaps in information where it may be most needed. It is hoped that a better understanding of the institutional elements that commonly form policy documents will enable organisations to better convey the required, or desired, information to hackers that wish to safely participate in the search for vulnerabilities in an organisation’s assets. Furthermore, through the use of the fine-tuned models made publicly available, it is hoped that hackers will gain a greater awareness of the elements within the policy documents they encounter, and help better inform their decisions surrounding safe participation in a particular programme given the information available, or become more aware of that which is missing. A summary of the findings to the subsidiary research questions presented in Section 2.3 is as follows:

(1)

Within CVD policy documents, hackers are exposed to a litany of generic (e.g., ‘applicable laws’) and specific (e.g., ‘DMCA’) references to legal constraints by which they must abide. Understanding the applicable laws and regulations places a considerable burden upon hackers, which could be alleviated by further simplification and standardisation of the legal language used throughout policy documents.

(2)

Considerable variation exists across programmes within the requirements for rewards, instructions for report submission, and the methods by which reports are assessed. It may be prudent for organisations to emphasise any deviations from, or choose to align with, standard practices within these areas to avoid potential misunderstandings with hackers.

(3)

Organisations, particularly those operating on a bug bounty platform, typically include information on a programme’s scope, rewards, and submission guidelines. However, a majority of programmes fail to include pertinent legal information, potentially introducing hackers to additional risks and liabilities [67]. Furthermore, there is often a lack of specificity as to the restrictions on participants due to their background.

Although collected as part of the study, a significant amount of data is yet to be analysed. It may be that the investigation of this data could underpin future work by the wider research community. Questions such as How do organisational CVD policies change over time?, What is the interplay between policy changes, bounty updates, and the activity of hackers?, and Are limited time promotional events an effective method to motivate hackers? may be investigated via these datasets. We would argue that, in addition to the results presented in this article, such investigations could help provide valuable insights into the ways in which organisations might attempt to bring about behavioural change through institutional change.

To aid those involved in the search for vulnerabilities, we encourage engineers to build easily accessible (for those less familiar with running the NLP models) web-based tools using the published models and data, helping to automatically highlight the pertinent information contained in, or missing from, often hard-to-read policy documents. It is hoped that such tools could help individuals better understand the risks and requirements that are explicitly or implicitly present when participating in a particular CVD programme. Furthermore, we encourage programme operators and platforms to build upon and use the models to critically analyse their own policies, identify areas of uncertainty and ambiguity, and address any shortcomings that might be present to help encourage safer, and better-defined, participation for hackers.

Acknowledgments

The authors would like to thank the reviewers for their insightful and helpful comments. This research was undertaken as part of the Data and Models for Secure Software Engineering project, funded by the UK.s National Cyber Security Centre.

Footnotes

In the economic sense of reducing the social costs of poor organisational security [40].

Found at https://www.bugcrowd.com/resources/report/inside-the-mind-of-a-hacker/.

https://immunefi.com/bounty/wormhole/.

⁴

All material relating to the study can be found at https://github.com/walshe96/cvd-policy-documents.

⁵

See: https://github.com/disclose/bug-bounty-platforms.

⁶

https://www.hackerone.com.

⁷

https://www.bugcrowd.com.

⁸

https://immunefi.com.

⁹

https://hackenproof.com.

¹⁰

Using the external programmes filter on https://hackerone.com/directory/ and from https://github.com/disclose/diodb.

¹¹

https://github.com/walshe96/cvd-policy-documents.

¹²

Documentation found at https://www.nltk.org/api/nltk.tokenize.html.

¹³

https://prodi.gy.

¹⁴

https://spacy.io.

¹⁵

https://github.com/walshe96/cvd-policy-documents.

¹⁶

https://github.com/walshe96/cvd-policy-documents.

¹⁷

From https://www.bugcrowd.com/terms-and-conditions/.

¹⁸

For an example, see https://www.hackerone.com/terms/general.

¹⁹

For an example, see https://www.hackerone.com/terms/finder.

²⁰

From https://www.bugcrowd.com/termsandconditions/.

²¹

Policy found at https://www.hackerone.com/disclosure-guidelines.

²²

For an example, see https://www.hackerone.com/policies/code-of-conduct.

²³

Found at https://www.cs.mcgill.ca/~jpineau/ReproducibilityChecklist.pdf.

²⁴

https://huggingface.co.

²⁵

Template found at https://github.com/huggingface/datasets/blob/main/templates/README.md.

A Discussion of Replicability and Reproducibility

A.1 Background

Across academic domains, there is growing discussion surrounding the replicability and reproducibility of results put forward in published literature that stems from the employment of, often black-box, ML models [9]. Using the definitions provided by Liu et al. [46], replicability refers to the ability to reproduce the reported experimental results using the same models and data, and reproducibility refers to the ability to apply the same model and protocols to other, previously unseen, datasets and yield similar findings [46]. This is represented in Table 11.

Table 11.

		Data
		Same	Different
Approach	Same	Reproducible	Replicable
Approach	Different	Robust	Generalisable

Table 11. Terminology Used to Describe Experiments That Seek to Vary the Data or Modelling Approach of the Original Study

Adapted from Pineau et al. [63] and https://github.com/WhitakerLab/ReproducibleResearch.

A plethora of reasons inhibit the ability of researchers to validate findings that rely upon complex ML models, including lack of published source code [28], lack of published trained models [39], lack of published hyper-parameters or variable descriptions [28], lack of published random seeds [33], use of model hacking (cherry-picking the single best model from many trained models) [57], unknown train and test splits in the data [39], unknown occurrences of data leakage [48, 61], uncontrollable randomness [6], hardware constraints (an issue particularly prevalent with large language model research in the field of NLP [62, 98]), lack of development environment details [10], and code rot [25]. Furthermore, leveraging novel or non-public datasets raises additional concerns about the validity of the claims put forward, as replication may be difficult, if not infeasible, for researchers external to the original study [9].

Sonnenburg et al. [68] propose six underlying reasons that may contribute to a lack of published software or open sourcing alongside paper publication. These include misunderstanding that the software/code is not part of the scientific contribution, misconceptions surrounding licensing, lack of incentives to publish the source code, a reluctance to publish ‘bad’ or messy code, perceived risk of critique from other researchers, and a tendency for reviewers to allow for paper publication without the requirement of source code to be provided. In the context of software engineering literature, Liu et al. [46] found that 74.2% of studies failed to provide sufficient source code and data, and only 10.8% of studies discussed the impact on replicability and reproducibility.

A.2 Best Practices

In 2019, the NeurIPS conference for research in ML introduced a reproducibility programme to help promote ‘open and accessible research’ through the use of revised code submission policies, a reproducibility challenge (a conference track, and successor to the original challenge presented at ICLR 2018, focused on replicating the experiments of publication in the wider conference), and the creation of a reproducibility-focused submission checklist [63]. For practitioners, the checklist²³ contains 21 high-level checks across five domains, enabling them to help ensure the reproducibility of their work.

Although aimed at the life sciences, the general ML guidance put forward by Heil et al. [32] outlines a series of standards (bronze, silver, and gold) that describe the use of seven best practices for increasing levels of trustworthy analyses.

To increase transparency with public datasets, the repositories on Hugging Face²⁴ (a private company that hosts ML models and relevant datasets) are encouraged to contain a dataset card²⁵ with pertinent information (data sources, splits, annotation process, etc.) that may be of interest to researchers looking to leverage the dataset for their own work, or to replicate other studies that make use of the dataset.

A.3 Evaluation of Replicability and Reproducibility

The three aforementioned frameworks of best practices are used to evaluate and communicate the extent to which the results are replicable and reproducible, and provide necessary information on the models and datasets. For conciseness, the Reproducibility Checklist [63] is considered in Table 12, with the standards proposed by Heil et al. [32] and the Hugging Face dataset information card provided in the GitHub repository.

Table 12.

Best Practice	Comments
Data published and downloadable	Yes: All datasets, both raw and annotated, are available in the repository.
Models published and downloadable	Yes: Trained spaCy models are available in the repository.
Source code published and downloadable	Yes: Python source code files for the data processing pipelines are available in the repository.
Dependencies set up in a single command	Yes: A requirements.txt is provided in the repository, allowing for an environment to be quickly set up with the required dependencies. However, the inclusion of a software container may allow for greater flexibility in the future.
Key analysis details recorded	Yes: A README file is provided in the repository that outlines all instructions required to reproduce the entire analysis.
Analysis components set to deterministic	Unsure: As a GPU was unavailable during the analysis period, it cannot be confirmed whether training and inference using the spaCy library on a GPU will contain only deterministic components.
Entire analysis reproducible with a single command	No: For added clarity, the reproducible steps for the NER and text classification are split into two distinct commands.

Table 12. Evaluation of Study Against the Best Practices of Heil et al. [32]

As determinism cannot be guaranteed for varying hardware setups, the study meets the bronze standard for replicability.

B Examples of Taxonomy Elements

The following examples of the elements set forth in the taxonomy produced by Laszka et al. [42] have been collected from public CVD programmes and serve to better communicate the elements that make up policy documents:

•

SCOPE-IN: “The following PayPal brands are in scope: PayPal, Venmo, ...” Example from PayPal’s programme on HackerOne.

•

SCOPE-OUT: “Security issues discovered in the AWS IP Space are not in scope for Amazon Vulnerability Research Program. As an infrastructure provider, AWS customers operate assets in this space. Discovering and testing against AWS and AWS customer assets is strictly out of scope for Amazon Vulnerability Research Program and against the AWS AUP”. Example from Amazon’s programme on HackerOne.

•

VULN-ELIGIBLE: “Example Topics of Interest: Escalation of Privilege, Information disclosure, Denial of Service, ...” Example from Intel’s programme on Intigriti.

•

VULN-INELIGIBLE: “The Netflix Bug Bounty program follows Bugcrowd’s Vulnerability Rating Taxonomy with some additional vulnerability classes we consider to be excluded below: ...” Example from Netflix’s programme on Bugcrowd.

•

ENGAGEMENT: “Use your [username]@wearehackerone email alias when testing or reporting bugs”. Example from Deliveroo’s programme on HackerOne.

•

PROHIBITED-ACTIONS: “Do not try to exploit service providers we use, prohibited actions include, but are not limited to bruteforcing login credentials of Domain Registrars, DNS Hosting Companies, Email Providers and/or others. The Firm does not authorize you to perform any actions to a non-GS owned property/system/service/data”. Example from Goldman Sachs’ programme on HackerOne.

•

LEGAL: “If legal action is initiated by a third party against you for conduct that Meta determines to have complied with these Bug Bounty Programme Terms, Meta will take steps to make it known, either to the public or the court, that your actions were authorised under this program”. Example from Meta’s stand-alone programme.

•

PARTICIPANT-RESTRICTIONS: “We are unable to issue rewards to individuals who are on sanctions lists, or who reside in countries (e.g., Cuba, Iran, North Korea, Syria, Crimea, and the so-called Donetsk People’s Republic and Luhansk People’s Republic) on sanctions lists”. Example from Google’s stand-alone programme.

•

GUIDELINES-SUBMISSIONS: “Provide details of the vulnerability, including information needed to reproduce and validate the vulnerability and a Proof of Concept (POC). Any vulnerability that implicates functionality not resident on a research-registered vehicle must be reported within 168 hours and zero minutes (7 days) of identifying the vulnerability”. Example from Tesla’s stand-alone programme.

•

GUIDELINES-DISCLOSURE: “You may not use, disclose or distribute any such Confidential Information, including without limitation any information regarding your Bug Bounty submitted report, without our prior explicit consent. You must get explicit consent by submitting a disclosure request to our program. Please note, not all requests for public disclosure will be approved.”. Example from Uber’s programme on HackerOne.

•

REWARD-EVALUATION: “Bounty payments are determined by the level of access or execution achieved by the reported issue, modified by the quality of the report. A maximum amount is set for each category. The exact payment amounts are determined after review by Apple.” Example from Apple’s stand-alone programme.

•

COMPANY-STATEMENT: “Microsoft strongly believes close partnerships with researchers make customers more secure. Security researchers play an integral role in the ecosystem by discovering vulnerabilities missed in the software development process. Each year we partner together to better protect billions of customers worldwide”. Example from Microsoft’s stand-alone programme.

C Co-occurrence of Paragraph Class Labels

Figure 4 shows the co-occurrence of all labels with the dominant label for all paragraphs from the output of the classification model. Shown in Figures 5 and 6 is the label co-occurrence for platform-based and stand-alone programmes, respectively.

Fig. 4.

Fig. 5.

Fig. 6.

References

[1]

Hervé Abdi and Lynne J. Williams. 2010. Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics 2, 4 (2010), 433–459.

Abstract

1 Introduction

2 Background and Motivation

2.1 Background

2.2 Related Work

2.3 Motivation

3 Methodology

3.1 Data Collection

3.2 Annotated Dataset Curation

3.3 Modelling Approach

3.3.1 Text Classification.

3.3.2 Named Entity Recognition.

3.3.3 Pattern Recognition.

3.4 Document Pipeline

4 Results

4.1 Formal Constraints

4.1.1 A Consideration of the Sources of Constraints.

4.1.2 Generic Constraints.

4.1.3 Specific Constraints.

4.2 Informal Constraints

4.3 Institutional Analysis

5 Discussion

5.1 Formal Constraints

5.2 Informal Constraints

5.3 Institutional Analysis

5.4 Limitations

6 Conclusion

Acknowledgments

Footnotes

A Discussion of Replicability and Reproducibility

A.1 Background

A.2 Best Practices

A.3 Evaluation of Replicability and Reproducibility

B Examples of Taxonomy Elements

C Co-occurrence of Paragraph Class Labels

References

Cited By

Index Terms

Recommendations

A longitudinal study of hacker behaviour

New Hurdles for Vulnerability Disclosure

Vulnerability disclosure and cybersecurity awareness campaigns on twitter during COVID‐19

Comments

Information

Published In

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Login options

Full Access

Share

Share this Publication link

Share on social media

Affiliations