[Question] 运行快速开始中的例子python scripts/train.py -c examples/bert_crf/configs/resume.yaml出现An error occurred while generating the dataset #47

Gsq6161 · 2024-09-27T13:39:01Z

What is your question?

我是一名刚开始学习的小白，本地部署adaseq，跟着仓库中的流程走的，在 except Exception as e:
# Ignore the writer's error for no examples written to the file if this error was caused by the error in _generate_examples before the first example was yielded
if isinstance(e, SchemaInferenceError) and e.context is not None:
e = e.context
raise DatasetGenerationError("An error occurred while generating the dataset") from e执行不通了，该如何解决呢

What have you tried?

降低torch版本、datasets版本均不管用

Code (if necessary)

(adaseq) PS C:\Users\Acer\Desktop\AdaSeq-master> python scripts/train.py -c examples/bert_crf/configs/resume.yaml
2024-09-27 21:32:46,554 - modelscope - WARNING - The reference has been Deprecated in modelscope v1.4.0+, please use from modelscope.msdatasets.dataset_cls.custom_datasets import TorchCustomDataset
2024-09-27 21:32:47,201 - INFO - adaseq.data.dataset_manager - Will use a custom loading script: E:\Anaconda\envs\adaseq\lib\site-packages\adaseq\data\dataset_builders\named_entity_recognition_dataset_builder.py
Downloading data: 135kB [00:00, 2.86MB/s]
Downloading data: 1.09MB [00:00, 10.4MB/s]
Downloading data: 120kB [00:00, 2.56MB/s]
Generating test split: 0 examples [00:00, ? examples/s]
Traceback (most recent call last):
File "E:\Anaconda\envs\adaseq\lib\site-packages\datasets\builder.py", line 1739, in _prepare_split_single
writer = writer_class(
File "E:\Anaconda\envs\adaseq\lib\site-packages\datasets\arrow_writer.py", line 338, in init
self.stream = self._fs.open(path, "wb")
File "E:\Anaconda\envs\adaseq\lib\site-packages\fsspec\spec.py", line 1303, in open
f = self._open(
File "E:\Anaconda\envs\adaseq\lib\site-packages\fsspec\implementations\local.py", line 191, in _open
return LocalFileOpener(path, mode, fs=self, **kwargs)
File "E:\Anaconda\envs\adaseq\lib\site-packages\fsspec\implementations\local.py", line 355, in init
self._open()
File "E:\Anaconda\envs\adaseq\lib\site-packages\fsspec\implementations\local.py", line 360, in _open
self.f = open(self.path, mode=self.mode)
FileNotFoundError: [Errno 2] No such file or directory: 'C:/Users/Acer/.cache/huggingface/datasets/named_entity_recognition_dataset_builder/default-84b1c02799fb57ba/0.0.0/db737b9bb893f20fb03d04403a30bf7c033256c212b7e9f0ebc6e9c95
8535c51.incomplete/named_entity_recognition_dataset_builder-test-00000-00000-of-NNNNN.arrow'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "C:\Users\Acer\Desktop\AdaSeq-master\scripts\train.py", line 39, in
train_model_from_args(args)
File "E:\Anaconda\envs\adaseq\lib\site-packages\adaseq\commands\train.py", line 84, in train_model_from_args
train_model(
File "E:\Anaconda\envs\adaseq\lib\site-packages\adaseq\commands\train.py", line 156, in train_model
trainer = build_trainer_from_partial_objects(
File "E:\Anaconda\envs\adaseq\lib\site-packages\adaseq\commands\train.py", line 185, in build_trainer_from_partial_objects
dm = DatasetManager.from_config(task=config.task, **config.dataset)
File "E:\Anaconda\envs\adaseq\lib\site-packages\adaseq\data\dataset_manager.py", line 182, in from_config
hfdataset = hf_load_dataset(path, name=name, **kwargs)
File "E:\Anaconda\envs\adaseq\lib\site-packages\datasets\load.py", line 2628, in load_dataset
builder_instance.download_and_prepare(
File "E:\Anaconda\envs\adaseq\lib\site-packages\datasets\builder.py", line 1029, in download_and_prepare
self._download_and_prepare(
File "E:\Anaconda\envs\adaseq\lib\site-packages\datasets\builder.py", line 1791, in _download_and_prepare
super()._download_and_prepare(
File "E:\Anaconda\envs\adaseq\lib\site-packages\datasets\builder.py", line 1124, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File "E:\Anaconda\envs\adaseq\lib\site-packages\datasets\builder.py", line 1629, in _prepare_split
for job_id, done, content in self._prepare_split_single(
File "E:\Anaconda\envs\adaseq\lib\site-packages\datasets\builder.py", line 1786, in _prepare_split_single
raise DatasetGenerationError("An error occurred while generating the dataset") from e
datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset

What's your environment?

AdaSeq Version (e.g., 1.0 or master):0.6.6
ModelScope Version (e.g., 1.0 or master):1.18.1
PyTorch Version (e.g., 1.12.1):1.12.1和1.9.0都试过
OS (e.g., Ubuntu 20.04):windows10
Python version:3.9
CUDA/cuDNN version:
GPU models and configuration:
Any other relevant information:

Code of Conduct

I agree to follow this project's Code of Conduct

The text was updated successfully, but these errors were encountered:

lengyanglph · 2024-11-08T03:47:50Z

我也是这个问题，'C:/Users/Acer/.cache/huggingface/datasets/named_entity_recognition_dataset_builder/default-84b1c02799fb57ba/0.0.0/db737b9bb893f20fb03d04403a30bf7c033256c212b7e9f0ebc6e9c95
8535c51.incomplete/named_entity_recognition_dataset_builder-test-00000-00000-of-NNNNN.arrow'是本地缓存，incomplete标记表示缓存文件还没有生成，读取这个不存在文件就报错了……

lwj01 · 2024-11-14T12:20:03Z

@lengyanglph 请问大佬解决了吗？

lengyanglph · 2024-11-25T03:37:39Z

@lwj01 我的思路是自己把数据文件处理好之后保存到本地，然后加载这个
1、下载yaml中的数据文件，然后用datasets的load_dataset方法加载，用open应该也行
2、找到处理数据的代码拷出来，用这些代码处理文本，生成数据集，保存到本地
3、修改yaml文件中的dataset:datas_file为你保存到本地的数据集路径
3、修改dataset_manager.py文件大概180行“hfdataset = hf_load_dataset(”改成“hfdataset = load_from_disk(path_to_disk)”
WX：15964928893

Gsq6161 added the question Further information is requested label Sep 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] 运行快速开始中的例子python scripts/train.py -c examples/bert_crf/configs/resume.yaml出现An error occurred while generating the dataset #47

[Question] 运行快速开始中的例子python scripts/train.py -c examples/bert_crf/configs/resume.yaml出现An error occurred while generating the dataset #47

Gsq6161 commented Sep 27, 2024

lengyanglph commented Nov 8, 2024

lwj01 commented Nov 14, 2024

lengyanglph commented Nov 25, 2024

[Question] 运行快速开始中的例子python scripts/train.py -c examples/bert_crf/configs/resume.yaml出现An error occurred while generating the dataset #47

[Question] 运行快速开始中的例子python scripts/train.py -c examples/bert_crf/configs/resume.yaml出现An error occurred while generating the dataset #47

Comments

Gsq6161 commented Sep 27, 2024

What is your question?

What have you tried?

Code (if necessary)

What's your environment?

Code of Conduct

lengyanglph commented Nov 8, 2024

lwj01 commented Nov 14, 2024

lengyanglph commented Nov 25, 2024