Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] 运行快速开始中的例子python scripts/train.py -c examples/bert_crf/configs/resume.yaml出现An error occurred while generating the dataset #47

Open
1 task done
Gsq6161 opened this issue Sep 27, 2024 · 3 comments
Labels
question Further information is requested

Comments

@Gsq6161
Copy link

Gsq6161 commented Sep 27, 2024

What is your question?

我是一名刚开始学习的小白,本地部署adaseq,跟着仓库中的流程走的,在 except Exception as e:
# Ignore the writer's error for no examples written to the file if this error was caused by the error in _generate_examples before the first example was yielded
if isinstance(e, SchemaInferenceError) and e.context is not None:
e = e.context
raise DatasetGenerationError("An error occurred while generating the dataset") from e执行不通了,该如何解决呢

What have you tried?

降低torch版本、datasets版本均不管用

Code (if necessary)

(adaseq) PS C:\Users\Acer\Desktop\AdaSeq-master> python scripts/train.py -c examples/bert_crf/configs/resume.yaml
2024-09-27 21:32:46,554 - modelscope - WARNING - The reference has been Deprecated in modelscope v1.4.0+, please use from modelscope.msdatasets.dataset_cls.custom_datasets import TorchCustomDataset
2024-09-27 21:32:47,201 - INFO - adaseq.data.dataset_manager - Will use a custom loading script: E:\Anaconda\envs\adaseq\lib\site-packages\adaseq\data\dataset_builders\named_entity_recognition_dataset_builder.py
Downloading data: 135kB [00:00, 2.86MB/s]
Downloading data: 1.09MB [00:00, 10.4MB/s]
Downloading data: 120kB [00:00, 2.56MB/s]
Generating test split: 0 examples [00:00, ? examples/s]
Traceback (most recent call last):
File "E:\Anaconda\envs\adaseq\lib\site-packages\datasets\builder.py", line 1739, in _prepare_split_single
writer = writer_class(
File "E:\Anaconda\envs\adaseq\lib\site-packages\datasets\arrow_writer.py", line 338, in init
self.stream = self._fs.open(path, "wb")
File "E:\Anaconda\envs\adaseq\lib\site-packages\fsspec\spec.py", line 1303, in open
f = self._open(
File "E:\Anaconda\envs\adaseq\lib\site-packages\fsspec\implementations\local.py", line 191, in _open
return LocalFileOpener(path, mode, fs=self, **kwargs)
File "E:\Anaconda\envs\adaseq\lib\site-packages\fsspec\implementations\local.py", line 355, in init
self._open()
File "E:\Anaconda\envs\adaseq\lib\site-packages\fsspec\implementations\local.py", line 360, in _open
self.f = open(self.path, mode=self.mode)
FileNotFoundError: [Errno 2] No such file or directory: 'C:/Users/Acer/.cache/huggingface/datasets/named_entity_recognition_dataset_builder/default-84b1c02799fb57ba/0.0.0/db737b9bb893f20fb03d04403a30bf7c033256c212b7e9f0ebc6e9c95
8535c51.incomplete/named_entity_recognition_dataset_builder-test-00000-00000-of-NNNNN.arrow'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "C:\Users\Acer\Desktop\AdaSeq-master\scripts\train.py", line 39, in
train_model_from_args(args)
File "E:\Anaconda\envs\adaseq\lib\site-packages\adaseq\commands\train.py", line 84, in train_model_from_args
train_model(
File "E:\Anaconda\envs\adaseq\lib\site-packages\adaseq\commands\train.py", line 156, in train_model
trainer = build_trainer_from_partial_objects(
File "E:\Anaconda\envs\adaseq\lib\site-packages\adaseq\commands\train.py", line 185, in build_trainer_from_partial_objects
dm = DatasetManager.from_config(task=config.task, **config.dataset)
File "E:\Anaconda\envs\adaseq\lib\site-packages\adaseq\data\dataset_manager.py", line 182, in from_config
hfdataset = hf_load_dataset(path, name=name, **kwargs)
File "E:\Anaconda\envs\adaseq\lib\site-packages\datasets\load.py", line 2628, in load_dataset
builder_instance.download_and_prepare(
File "E:\Anaconda\envs\adaseq\lib\site-packages\datasets\builder.py", line 1029, in download_and_prepare
self._download_and_prepare(
File "E:\Anaconda\envs\adaseq\lib\site-packages\datasets\builder.py", line 1791, in _download_and_prepare
super()._download_and_prepare(
File "E:\Anaconda\envs\adaseq\lib\site-packages\datasets\builder.py", line 1124, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File "E:\Anaconda\envs\adaseq\lib\site-packages\datasets\builder.py", line 1629, in _prepare_split
for job_id, done, content in self._prepare_split_single(
File "E:\Anaconda\envs\adaseq\lib\site-packages\datasets\builder.py", line 1786, in _prepare_split_single
raise DatasetGenerationError("An error occurred while generating the dataset") from e
datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset

What's your environment?

  • AdaSeq Version (e.g., 1.0 or master):0.6.6
  • ModelScope Version (e.g., 1.0 or master):1.18.1
  • PyTorch Version (e.g., 1.12.1):1.12.1和1.9.0都试过
  • OS (e.g., Ubuntu 20.04):windows10
  • Python version:3.9
  • CUDA/cuDNN version:
  • GPU models and configuration:
  • Any other relevant information:

Code of Conduct

  • I agree to follow this project's Code of Conduct
@Gsq6161 Gsq6161 added the question Further information is requested label Sep 27, 2024
@Gsq6161 Gsq6161 changed the title [Question] 运行python scripts/train.py -c examples/bert_crf/configs/resume.yaml出现An error occurred while generating the dataset [Question] 运行快速开始中的例子python scripts/train.py -c examples/bert_crf/configs/resume.yaml出现An error occurred while generating the dataset Sep 27, 2024
@lengyanglph
Copy link

我也是这个问题,'C:/Users/Acer/.cache/huggingface/datasets/named_entity_recognition_dataset_builder/default-84b1c02799fb57ba/0.0.0/db737b9bb893f20fb03d04403a30bf7c033256c212b7e9f0ebc6e9c95
8535c51.incomplete/named_entity_recognition_dataset_builder-test-00000-00000-of-NNNNN.arrow'是本地缓存,incomplete标记表示缓存文件还没有生成,读取这个不存在文件就报错了……

@lwj01
Copy link

lwj01 commented Nov 14, 2024

@lengyanglph 请问大佬解决了吗?

@lengyanglph
Copy link

@lwj01 我的思路是自己把数据文件处理好之后保存到本地,然后加载这个
1、下载yaml中的数据文件,然后用datasets的load_dataset方法加载,用open应该也行
2、找到处理数据的代码拷出来,用这些代码处理文本,生成数据集,保存到本地
3、修改yaml文件中的dataset:datas_file为你保存到本地的数据集路径
3、修改dataset_manager.py文件大概180行“hfdataset = hf_load_dataset(”改成“hfdataset = load_from_disk(path_to_disk)”
WX:15964928893

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants