transformers_datasets报错

使用huggingface的时datasets报错如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
multiprocess.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
File "/Users/admin/miniforge3/envs/py39/lib/python3.9/site-packages/multiprocess/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/Users/admin/miniforge3/envs/py39/lib/python3.9/site-packages/datasets/utils/py_utils.py", line 1353, in _write_generator_to_queue
for i, result in enumerate(func(**kwargs)):
File "/Users/admin/miniforge3/envs/py39/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 3397, in _map_single
writer.write_batch(batch)
File "/Users/admin/miniforge3/envs/py39/lib/python3.9/site-packages/datasets/arrow_writer.py", line 554, in write_batch
pa_table = pa.Table.from_arrays(arrays, schema=schema)
File "pyarrow/table.pxi", line 3657, in pyarrow.lib.Table.from_arrays
File "pyarrow/table.pxi", line 1416, in pyarrow.lib._sanitize_arrays
ValueError: Schema and number of arrays unequal
"""

这个报错很不明显,官方可能会在最新版本对这个错误的描述进行更改,真正的原因是有的批次数据是空的,检查你的批次数据,确保不为空即可。


transformers_datasets报错
https://johnson7788.github.io/2023/04/24/transformers-datasets%E6%8A%A5%E9%94%99/
作者
Johnson
发布于
2023年4月24日
许可协议