Hateful and antagonistic content published and propagated via the World Wide Web has the potentia... more Hateful and antagonistic content published and propagated via the World Wide Web has the potential to cause harm and suffering on an individual basis, and lead to social tension and disorder beyond cyber space. Despite new legislation aimed at prosecuting those who misuse new forms of communication to post threatening, harassing, or grossly offensive language-or cyber hate-and the fact large social media companies have committed to protecting their users from harm, it goes largely unpunished due to difficulties in policing online public spaces. To support the automatic detection of cyber hate online, specifically on Twitter, we build multiple individual models to classify cyber hate for a range of protected characteristics including race, disability and sexual orientation. We use text parsing to extract typed dependencies, which represent syntactic and grammatical relationships between words, and are shown to capture 'othering' language-consistently improving machine classification for different types of cyber hate beyond the use of a Bag of Words and known hateful terms. Furthermore, we build a data-driven blended model of cyber hate to improve classification where more than one protected characteristic may be attacked (e.g. race and sexual orientation), contributing to the nascent study of intersectionality in hate crime.
This paper specifies, designs and critically evaluates two tools for the automated identification... more This paper specifies, designs and critically evaluates two tools for the automated identification of demographic data (age, occupation and social class) from the profile descriptions of Twitter users in the United Kingdom (UK). Meta-data data routinely collected through the Collaborative Social Media Observatory (COSMOS: http://www.cosmosproject.net/) relating to UK Twitter users is matched with the occupational lookup tables between job and social class provided by the Office for National Statistics (ONS) using SOC2010. Using expert human validation, the validity and reliability of the automated matching process is critically assessed and a prospective class distribution of UK Twitter users is offered with 2011 Census baseline comparisons. The pattern matching rules for identifying age are explained and enacted following a discussion on how to minimise false positives. The age distribution of Twitter users, as identified using the tool, is presented alongside the age distribution o...
A new architecture of database federation called the MDSSF (Multiple Database Search Service Fede... more A new architecture of database federation called the MDSSF (Multiple Database Search Service Federation) is presented to support the procurement activities of the AEC (Architecture, Engineering and Construction) industry projects. In order to make procurement decisions, a contractor requires access to product information from several different product suppliers when constructing artefacts such as a hospital, or an office block. This
The requirements for collaborative services, especially pertaining to order and delivery, are qui... more The requirements for collaborative services, especially pertaining to order and delivery, are quite different compared to traditional distributed applications. The NaradaBrokering messaging substrate enables scalable, fault-tolerant, distributed interactions between entities, and is based on the publish/subscribe paradigm. The substrate also incorporates support for Grid and Web Service. More recently, we have incorporated services within the substrate which enable us to
2013 IEEE Sixth International Conference on Cloud Computing, 2013
ABSTRACT Over recent years, there has been an emerging interest in supporting social media analys... more ABSTRACT Over recent years, there has been an emerging interest in supporting social media analysis for marketing, opin- ion analysis and understanding community cohesion. Social media data conforms to many of the categorisations attributed to “big-data” – i.e. volume, velocity and variety. Generally analysis needs to be undertaken over large volumes of data in an efficient and timely manner. A variety of computational infrastructures have been reported to achieve this. We present the COSMOS platform supporting sentiment and tension analysis on Twitter data, and demonstrate how this platform can be scaled using the OpenNebula Cloud environment with Map/Reduce-based analysis using Hadoop. In particular, we describe the types of system configurations that would be most useful from a performance perspective – i.e. how virtual machines in the infrastructure should be distributed to reduce variability in the analysis performance. We demonstrate the approach using a data set consisting of several million Twitter messages, analysed over two types of Cloud infrastructure
ABSTRACT In the Architecture/Engineering/Construction (A/E/C) industry, large projects are tackle... more ABSTRACT In the Architecture/Engineering/Construction (A/E/C) industry, large projects are tackled by consortia of companies and individuals, who work collaboratively for the duration of the project. Such projects are complex and consortia members provide a range of skills to the project. The running of these A/E/C industry projects requires the formation of secure virtual organisations to enable collaboration. An important feature of the consortia is that they are dynamic in nature and are formed for the lifetime of the project. Members can ...
Hateful and antagonistic content published and propagated via the World Wide Web has the potentia... more Hateful and antagonistic content published and propagated via the World Wide Web has the potential to cause harm and suffering on an individual basis, and lead to social tension and disorder beyond cyber space. Despite new legislation aimed at prosecuting those who misuse new forms of communication to post threatening, harassing, or grossly offensive language-or cyber hate-and the fact large social media companies have committed to protecting their users from harm, it goes largely unpunished due to difficulties in policing online public spaces. To support the automatic detection of cyber hate online, specifically on Twitter, we build multiple individual models to classify cyber hate for a range of protected characteristics including race, disability and sexual orientation. We use text parsing to extract typed dependencies, which represent syntactic and grammatical relationships between words, and are shown to capture 'othering' language-consistently improving machine classification for different types of cyber hate beyond the use of a Bag of Words and known hateful terms. Furthermore, we build a data-driven blended model of cyber hate to improve classification where more than one protected characteristic may be attacked (e.g. race and sexual orientation), contributing to the nascent study of intersectionality in hate crime.
This paper specifies, designs and critically evaluates two tools for the automated identification... more This paper specifies, designs and critically evaluates two tools for the automated identification of demographic data (age, occupation and social class) from the profile descriptions of Twitter users in the United Kingdom (UK). Meta-data data routinely collected through the Collaborative Social Media Observatory (COSMOS: http://www.cosmosproject.net/) relating to UK Twitter users is matched with the occupational lookup tables between job and social class provided by the Office for National Statistics (ONS) using SOC2010. Using expert human validation, the validity and reliability of the automated matching process is critically assessed and a prospective class distribution of UK Twitter users is offered with 2011 Census baseline comparisons. The pattern matching rules for identifying age are explained and enacted following a discussion on how to minimise false positives. The age distribution of Twitter users, as identified using the tool, is presented alongside the age distribution o...
A new architecture of database federation called the MDSSF (Multiple Database Search Service Fede... more A new architecture of database federation called the MDSSF (Multiple Database Search Service Federation) is presented to support the procurement activities of the AEC (Architecture, Engineering and Construction) industry projects. In order to make procurement decisions, a contractor requires access to product information from several different product suppliers when constructing artefacts such as a hospital, or an office block. This
The requirements for collaborative services, especially pertaining to order and delivery, are qui... more The requirements for collaborative services, especially pertaining to order and delivery, are quite different compared to traditional distributed applications. The NaradaBrokering messaging substrate enables scalable, fault-tolerant, distributed interactions between entities, and is based on the publish/subscribe paradigm. The substrate also incorporates support for Grid and Web Service. More recently, we have incorporated services within the substrate which enable us to
2013 IEEE Sixth International Conference on Cloud Computing, 2013
ABSTRACT Over recent years, there has been an emerging interest in supporting social media analys... more ABSTRACT Over recent years, there has been an emerging interest in supporting social media analysis for marketing, opin- ion analysis and understanding community cohesion. Social media data conforms to many of the categorisations attributed to “big-data” – i.e. volume, velocity and variety. Generally analysis needs to be undertaken over large volumes of data in an efficient and timely manner. A variety of computational infrastructures have been reported to achieve this. We present the COSMOS platform supporting sentiment and tension analysis on Twitter data, and demonstrate how this platform can be scaled using the OpenNebula Cloud environment with Map/Reduce-based analysis using Hadoop. In particular, we describe the types of system configurations that would be most useful from a performance perspective – i.e. how virtual machines in the infrastructure should be distributed to reduce variability in the analysis performance. We demonstrate the approach using a data set consisting of several million Twitter messages, analysed over two types of Cloud infrastructure
ABSTRACT In the Architecture/Engineering/Construction (A/E/C) industry, large projects are tackle... more ABSTRACT In the Architecture/Engineering/Construction (A/E/C) industry, large projects are tackled by consortia of companies and individuals, who work collaboratively for the duration of the project. Such projects are complex and consortia members provide a range of skills to the project. The running of these A/E/C industry projects requires the formation of secure virtual organisations to enable collaboration. An important feature of the consortia is that they are dynamic in nature and are formed for the lifetime of the project. Members can ...
Uploads
Papers by Pete Burnap