← 返回题库
初级

Default数据集验证集方法重复

未完成
初级参考 完整示例代码供参考,建议自己理解后重新输入
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
default = pd.read_csv('https://liangdaima.com/static/data/statistics/Default.csv')
default['default'] = (default['default'] == 'Yes').astype(int)
X = default[['income', 'balance']]
y = default['default']
for seed in [1, 5, 10]:
    X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.3, random_state=seed)
    model = LogisticRegression()
    model.fit(X_train, y_train)
    pred = model.predict(X_val)
    print(f'随机种子{seed}的验证集错误率:', 1 - accuracy_score(y_val, pred))

示例

输入
solve()
期望输出
随机种子1的验证集错误率: 0.024666666666666615
随机种子5的验证集错误率: 0.025000000000000022
随机种子10的验证集错误率: 0.02966666666666662
Python 代码 🔒 登录后使用
🔒

登录后即可练习

注册免费账号,在浏览器中直接运行 Python 代码