Eagle: Ethical dataset given from real interactions

M Kaneko, D Bollegala, T Baldwin - arXiv preprint arXiv:2402.14258, 2024 - arxiv.org
arXiv preprint arXiv:2402.14258, 2024arxiv.org
Recent studies have demonstrated that large language models (LLMs) have ethical-related
problems such as social biases, lack of moral reasoning, and generation of offensive
content. The existing evaluation metrics and methods to address these ethical challenges
use datasets intentionally created by instructing humans to create instances including
ethical problems. Therefore, the data does not reflect prompts that users actually provide
when utilizing LLM services in everyday contexts. This may not lead to the development of …
Recent studies have demonstrated that large language models (LLMs) have ethical-related problems such as social biases, lack of moral reasoning, and generation of offensive content. The existing evaluation metrics and methods to address these ethical challenges use datasets intentionally created by instructing humans to create instances including ethical problems. Therefore, the data does not reflect prompts that users actually provide when utilizing LLM services in everyday contexts. This may not lead to the development of safe LLMs that can address ethical challenges arising in real-world applications. In this paper, we create Eagle datasets extracted from real interactions between ChatGPT and users that exhibit social biases, toxicity, and immoral problems. Our experiments show that Eagle captures complementary aspects, not covered by existing datasets proposed for evaluation and mitigation of such ethical challenges. Our code is publicly available at https://huggingface.co/datasets/MasahiroKaneko/eagle.
arxiv.org
Showing the best result for this search. See all results