← 返回题库
初级

统计材料上传与OCR解析状态

未完成
初级参考 完整示例代码供参考,建议自己理解后重新输入
def solve():
    from pyodide.http import open_url
    from io import StringIO
    import pandas as pd
    materials=pd.read_csv(StringIO(open_url("https://data.zuihe.com/dbd/ms-memcard/state_00/materials.csv").read()))
    print(f"Total materials: {len(materials)}")
    print(f"By status: {dict(materials.groupby('ocr_status').size())}")
    print(f"By file type: {dict(materials.groupby('file_type').size())}")
    parsed=materials[materials['ocr_status']=='parsed']
    print(f"Parsed: {len(parsed)}")
    print(f"Avg chars (parsed): {round(parsed['char_count'].mean(),0)}")
    print(f"Avg pages (parsed): {round(parsed['page_count'].mean(),1)}")

示例

输入
solve()
期望输出
Total materials: 40
By status: {'failed': np.int64(9), 'parsed': np.int64(21), 'parsing': np.int64(10)}
By file type: {'md': np.int64(16), 'pdf': np.int64(11), 'txt': np.int64(13)}
Parsed: 21
Avg chars (parsed): 25507.0
Avg pages (parsed): 28.8
Python 代码 🔒 登录后使用
🔒

登录后即可练习

注册免费账号,在浏览器中直接运行 Python 代码