留个纪念,记录下解决问题的思路
因为后来翻issue发现官方把问题都修复了(早知道我也去提了 :)

有下列问题的,拉一下最新代码就行了吧,除了【未解决】的那个还卡着。


π0 github地址
在做:π0基于自己的数据集微调 Fine-Tuning Base Models on Your Own Data 的时候遇到各种报错,记录一下。

推荐教程:π0的微调——如何基于各种开源数据集、以及私有数据集微调通用VLA π0(含我司七月的微调实践及在机械臂上的部署)


【未解决】XLA_PYTHON_CLIENT_MEM_FRACTION=0.9 uv run scripts/train.py pi0_fast_libero --exp-name=my_experiment --overwrite 卡在 Restoring checkpoint from /home/xxx/.cache/openpi/openpi-assets/checkpoints/pi0_fast_base/params

卡在 Restoring checkpoint from /home/xxx/.cache/openpi/openpi-assets/checkpoints/pi0_fast_base/params 就直接结束了,
还不知道怎么解决,等万能的网友回复 : )
在这里插入图片描述

执行代码如下:

git clone https://github.com/Physical-Intelligence/openpi.git
git submodule update --init --recursive
GIT_LFS_SKIP_SMUDGE=1 uv sync
GIT_LFS_SKIP_SMUDGE=1 uv pip install -e .
uv run scripts/compute_norm_stats.py --config-name pi0_fast_libero
XLA_PYTHON_CLIENT_MEM_FRACTION=0.9 uv run scripts/train.py pi0_fast_libero --exp-name=my_experiment --overwrite

跳过了:
Convert your data to a LeRobot dataset:uv run examples/libero/convert_libero_data_to_lerobot.py --data_dir /path/to/your/libero/data 这一步,难道是我看compute_norm_stats.py下载了数据集,我就跳过了convert_libero_data_to_lerobot,所以卡住了??但我跳过的原因是觉得已经下载了LeRobot数据集v2.0


发现下面这些问题,官方都修复了,只要拉取最新的代码就行

convert_libero_data_to_lerobot.py 报错:ImportError: cannot import name ‘LEROBOT_HOME’ from ‘lerobot.common.datasets.lerobot_dataset’

modified_libero_rlds数据集我放在了~/modified_libero_rlds这个路径
所以执行的代码是
uv run examples/libero/convert_libero_data_to_lerobot.py --data_dir ~/modified_libero_rlds 为了方便,还是写上/path/to/your/libero/data,下一节有个 No module named 'tensorflow_datasets’的报错,需要把uv run改成python,即python examples/libero/convert_libero_data_to_lerobot.py --data_dir ~/modified_libero_rlds

执行uv run examples/libero/convert_libero_data_to_lerobot.py --data_dir /path/to/your/libero/data
报错:

Traceback (most recent call last):
File “/root/openpi/examples/libero/convert_libero_data_to_lerobot.py”, line 23, in
from lerobot.common.datasets.lerobot_dataset import LEROBOT_HOME
ImportError: cannot import name ‘LEROBOT_HOME’ from ‘lerobot.common.datasets.lerobot_dataset’

解决方法:
convert_libero_data_to_lerobot.py中注释掉from lerobot.common.datasets.lerobot_dataset import LEROBOT_HOME,然后补上LEROBOT_HOME还有REPO_NAME 的定义
修改convert_libero_data_to_lerobot.py如下:

#from lerobot.common.datasets.lerobot_dataset import LEROBOT_HOME 注释掉
from lerobot.common.datasets.lerobot_dataset import LeRobotDataset
import tensorflow_datasets as tfds
import tyro

from pathlib import Path # 要加上
LEROBOT_HOME = Path("\~/modified_libero_rlds") # 随便放,不一定是libero数据集路径
REPO_NAME = Path("haha/libero")  # 先随便起个名
RAW_DATASET_NAMES = [
    "libero_10_no_noops",
    "libero_goal_no_noops",
    "libero_object_no_noops",
    "libero_spatial_no_noops",
]  
def main(data_dir: str, *, push_to_hub: bool = False): # data_dir是libero数据集路径
    # Clean up any existing dataset in the output directory
    output_path = LEROBOT_HOME / REPO_NAME
    if output_path.exists(): # 如果这里报错了,就是haha这个文件夹存在了,把它删掉重跑即可
        shutil.rmtree(output_path)

记得如果用os.join.path的话会报错如下,需要用from pathlib import Path

Traceback (most recent call last):
File “/root/openpi/examples/libero/convert_libero_data_to_lerobot.py”, line 106, in
tyro.cli(main)
File “/home/vipuser/miniconda3/lib/python3.11/site-packages/tyro/_cli.py”, line 229, in cli
return run_with_args_from_cli()
^^^^^^^^^^^^^^^^^^^^^^^^
File “/root/openpi/examples/libero/convert_libero_data_to_lerobot.py”, line 39, in main
output_path = LEROBOT_HOME / REPO_NAME
~~~^
TypeError: unsupported operand type(s) for /: ‘str’ and ‘str’


convert_libero_data_to_lerobot.py 报错:ModuleNotFoundError: No module named ‘tensorflow_datasets’ 和 Missing features: {‘task’}

执行uv run examples/libero/convert_libero_data_to_lerobot.py --data_dir /path/to/your/libero/data 报错:

Traceback (most recent call last):
File “/root/openpi/examples/libero/convert_libero_data_to_lerobot.py”, line 25, in
import tensorflow_datasets as tfds
ModuleNotFoundError: No module named ‘tensorflow_datasets’

查看到相关issues:https://github.com/Physical-Intelligence/openpi/issues/354 得知,要改成:
python examples/libero/convert_libero_data_to_lerobot.py --data_dir /path/to/your/libero/data ,改完后会有一个task的报错Missing features: {'task'}
在这里插入图片描述继续修改convert_libero_data_to_lerobot.py

for raw_dataset_name in RAW_DATASET_NAMES:
    raw_dataset = tfds.load(raw_dataset_name, data_dir=data_dir, split="train")
    for episode in raw_dataset:
        for step in episode["steps"].as_numpy_iterator():
            dataset.add_frame(
                {
                    "image": step["observation"]["image"],
                    "wrist_image": step["observation"]["wrist_image"],
                    "state": step["observation"]["state"],
                    "actions": step["action"],
                    "task": step["language_instruction"].decode(), #补上这句
                }
            )
        dataset.save_episode() #记得这里参数有个task=xxx的,也要删掉

就ok了,开始将Libero数据集转换为LeRobot数据集v2.0格式
在这里插入图片描述

compute_norm_stats.py 报错:No such file or directory: ‘/root/.cache/huggingface/lerobot/physical-intelligence/libero/meta/info.json’

REPO_NAME改成haha后,在第二步的uv run scripts/compute_norm_stats.py --config-name pi0_fast_libero中,在\src\openpi\training\config.py中搜索pi0_fast_libero,把里头的physical-intelligence/libero改成haha。否则

    TrainConfig(
        name="pi0_fast_libero",
        model=pi0_fast.Pi0FASTConfig(action_dim=7, action_horizon=10, max_token_len=180),
        data=LeRobotLiberoDataConfig(
            repo_id="haha/libero", # "physical-intelligence/libero"
            base_config=DataConfig(
                local_files_only=True,  # Set to True for local-only datasets.
                prompt_from_task=True,
            ),
        ),
        # Note that we load the pi0-FAST base model checkpoint here.
        weight_loader=weight_loaders.CheckpointWeightLoader("s3://openpi-assets/checkpoints/pi0_fast_base/params"),
        num_train_steps=30_000,
    ),

开始计算训练数据的归一化统计量:
在这里插入图片描述

compute_norm_stats.py后,显示Restoring checkpoint from /root/.cache/openpi/openpi-assets/checkpoints/pi0_fast_base/params后就停止了

在这里插入图片描述
换成
uv run scripts/compute_norm_stats.py --config-name pi0_fast_libero_low_mem_finetune
XLA_PYTHON_CLIENT_MEM_FRACTION=0.9 uv run scripts/train.py pi0_fast_libero_low_mem_finetune --exp-name=my_experiment --overwrite –

Logo

火山引擎开发者社区是火山引擎打造的AI技术生态平台,聚焦Agent与大模型开发,提供豆包系列模型(图像/视频/视觉)、智能分析与会话工具,并配套评测集、动手实验室及行业案例库。社区通过技术沙龙、挑战赛等活动促进开发者成长,新用户可领50万Tokens权益,助力构建智能应用。

更多推荐