使用示例¶

数据准备¶

psychometric要求输入是一个n_subject x n_item的pd.DataFrame,
列名随意,但每一列需要是一个item,这里我们用内建函数生成200个被试,80道题目的5因子数据

In [1]:
from psychometric.utils import generate_data
from matplotlib import pyplot as plt
data = generate_data(n_samples=200, n_items=80, n_factors=5, random_seed=42)
data # n_subjects x n_items DataFrame

fig, axes = plt.subplots(1, 2, figsize=(4, 2))
for ax, img, title, (xlab, ylab) in [
    (axes[0], data, "Input DataFrame", ("Items", "Subjects")),
    (axes[1], data.corr(), ">Item Correlation", ("Items", "Items")),
]:
    ax.imshow(img, cmap="viridis", aspect="auto")
    ax.set(title=title, xlabel=xlab, ylabel=ylab)
fig.tight_layout()
plt.show()
No description has been provided for this image

题项分析¶

部分,也有部分可以在因子与信度分析内

In [32]:
from psychometric import ItemAnalysis

ia = ItemAnalysis(data)
ia_result = ia.analyze()
print(f"item指标: {ia_result.keys()}")
item指标: dict_keys(['difficulty', 'citc', 'extreme_group', 'summary'])
In [33]:
ia_result['summary']
Out[33]:
item difficulty level citc quality difference significant discrimination recommendation
0 Q1 0.475000 中 0.348294 良好 0.462963 True 优秀 保留
1 Q2 0.467500 中 0.299068 可接受 0.500000 True 优秀 考虑修改
2 Q3 0.447500 中 0.277061 可接受 0.462963 True 优秀 考虑修改
3 Q4 0.640000 中 0.293378 可接受 0.518519 True 优秀 考虑修改
4 Q5 0.628333 中 0.305394 良好 0.555556 True 优秀 保留
... ... ... ... ... ... ... ... ... ...
75 Q76 0.301667 中 0.409116 优秀 0.629630 True 优秀 保留
76 Q77 0.316667 中 0.395270 良好 0.629630 True 优秀 保留
77 Q78 0.313333 中 0.361828 良好 0.500000 True 优秀 保留
78 Q79 0.303333 中 0.296451 可接受 0.407407 True 优秀 考虑修改
79 Q80 0.462500 中 0.361496 良好 0.444444 True 优秀 保留

80 rows × 9 columns

信度分析¶

In [34]:
from psychometric import Reliability

rel = Reliability(data)
rel_result = rel.analyze()
print(f"信度指标: {rel_result.keys()}")
信度指标: dict_keys(['cronbach_alpha', 'alpha_if_deleted', 'split_half', 'omega'])

alpha信度¶

In [35]:
rel_result['cronbach_alpha']
Out[35]:
{'alpha': np.float64(0.9231453987028012),
 'n_items': 80,
 'n_samples': 200,
 'quality': '优秀',
 'standardized_alpha': np.float64(0.9231465916683137)}

omega信度¶

In [36]:
rel_result['omega']
Out[36]:
{'omega_total': np.float64(0.8795071761542845),
 'n_factors': 1,
 'quality': '良好'}

分半信度¶

In [37]:
rel_result['split_half']
Out[37]:
{'method': 'even-odd',
 'half1_alpha': np.float64(0.8477386154572809),
 'half2_alpha': np.float64(0.8521508181377602),
 'correlation': np.float64(0.9382498374872311),
 'spearman_brown': np.float64(0.9681412781167456),
 'quality': '优秀'}

留一信度¶

In [39]:
rel_result['alpha_if_deleted']
Out[39]:
item alpha_if_deleted alpha_change recommendation original_alpha
0 Q1 0.922251 -0.000894 保留 0.923145
1 Q2 0.922526 -0.000620 保留 0.923145
2 Q3 0.922659 -0.000486 保留 0.923145
3 Q4 0.922590 -0.000556 保留 0.923145
4 Q5 0.922513 -0.000632 保留 0.923145
... ... ... ... ... ...
75 Q76 0.921856 -0.001290 保留 0.923145
76 Q77 0.921950 -0.001196 保留 0.923145
77 Q78 0.922163 -0.000982 保留 0.923145
78 Q79 0.922535 -0.000610 保留 0.923145
79 Q80 0.922213 -0.000933 保留 0.923145

80 rows × 5 columns

效度¶

In [40]:
from psychometric import Validity

val = Validity(data)
val_result = val.analyze()
print(f"效度指标: {val_result.keys()}")
/Users/zgh/Desktop/workingdir/psychometric/.venv/lib/python3.12/site-packages/factor_analyzer/utils.py:244: UserWarning: The inverse of the variance-covariance matrix was calculated using the Moore-Penrose generalized matrix inversion, due to its determinant being at or very close to zero.
  warnings.warn(
效度指标: dict_keys(['efa', 'ave_cr', 'discriminant_validity'])

探索性FA¶

默认选择特征值大于1的因子,也可以传n_factors到Validity.efa(n_factors)计算

In [8]:
efa = val_result['efa']
print(f"指标:{efa.keys()}")
指标:dict_keys(['n_factors', 'loadings', 'variance', 'communalities', 'kmo', 'bartlett', 'factor_structure', 'rotation', 'method'])
In [9]:
fig, ax = plt.subplots(figsize=(3, 2))
mat = efa['loadings']
cax = ax.imshow(mat, cmap="viridis", aspect="auto")
ax.set(title="EFA Loadings", xlabel="Factors", ylabel="Items")
fig.colorbar(cax, ax=ax, label="Loading Value")
plt.tight_layout()
plt.show()
No description has been provided for this image

验证性FA¶

In [20]:
struct = {'O': [f"Q{i}" for i in range(1, 17)],
          'C': [f"Q{i}" for i in range(17, 33)],
          'E': [f"Q{i}" for i in range(33, 49)],
          'A': [f"Q{i}" for i in range(49, 65)],
          'N': [f"Q{i}" for i in range(65, 81)]}
cfa = val.cfa(struct)
cfa.keys()
Out[20]:
dict_keys(['fit_indices', 'loadings', 'estimates', 'model', 'model_spec'])

拟合情况¶

In [21]:
cfa['fit_indices']
Out[21]:
DoF DoF Baseline chi2 chi2 p-value chi2 Baseline CFI GFI AGFI NFI TLI RMSEA AIC BIC LogLik
Value 3070 3160 3763.604778 1.110223e-16 18157.694594 0.953753 0.792727 0.78665 0.792727 0.952397 0.033695 302.363952 863.077905 18.818024

Factor loadings¶

In [22]:
cfa['estimates']
Out[22]:
lval op rval Estimate Std. Err z-value p-value
0 Q1 ~ O 1.000000 - - -
1 Q2 ~ O 1.104562 0.088425 12.491487 0.0
2 Q3 ~ O 1.240157 0.087992 14.093951 0.0
3 Q4 ~ O 1.324430 0.094921 13.952946 0.0
4 Q5 ~ O 1.278740 0.095589 13.377483 0.0
... ... ... ... ... ... ... ...
170 Q78 ~~ Q78 0.116118 0.012229 9.495469 0.0
171 Q79 ~~ Q79 0.105641 0.011136 9.486145 0.0
172 Q8 ~~ Q8 0.105177 0.011216 9.377198 0.0
173 Q80 ~~ Q80 0.097943 0.010183 9.618539 0.0
174 Q9 ~~ Q9 0.077467 0.008534 9.078012 0.0

175 rows × 7 columns

如需要可视化结构方程,请安装graphviz

In [27]:
from semopy import semplot
semplot(cfa['model'], "example_cfa.png", std_ests=True, plot_covs=True)
Out[27]:
No description has been provided for this image