(金融保险)金融数据挖掘与应用课程作业

发布时间:2020-02-06 11:31:38   来源:文档文库   
字号:

基于GLM(广义线性模型)的数据分析

SAS里的GLM应用在实际中比较广泛,对数据的分析具有比较强的普适性。趋势面回归分析(Trend Analysis) 是以多元回归分析为理论基础的一种预测与统计技术。它用空间坐标法进行多项式回归,从中估计出最佳的回归模型,因此也被称为趋势面分析,当不知道手中的数据呈线性还是非线性相关时,可以采用趋势面数据分析方法,以便找出拟合数据的最佳统计预测模型。

本文运用GLM对一定的数据进行GLM分析。

一、 数据与要求

此处选取15名吧不同程度的烟民的每日饮酒(啤酒)量与心电图指标(zb)的对应数据。然后设法建立zb与日抽烟量(X)/支和日饮酒量(y)/升之间的关系。

二、 运用GLM过程进行趋势面分析

1. 趋势分析的GLM程序

data beer;

input obsn x y zb;

cards;

01 30 10 280

02 25 11 260

03 35 13 330

04 40 14 400

05 45 14 410

06 20 12 270

07 18 11 210

08 25 12 280

09 25 13 300

10 23 13 290

11 40 14 410

12 45 15 420

13 48 16 425

14 50 18 450

15 55 19 470

;

proc glm;

model zb=x y/p;

proc glm;

model zb=x y x*x x*y y*y/p;

proc glm;

model zb=x y x*x*x x*x*y x*y*y y*y*y/p;

proc glm;

model zb=x y x*x*x x*x*y x*y*y y*y*y x*x*x*x x*x*x*y x*x*y*y x*y*y*y y*y*y*y/p;

run;

2. 四种分析模型结果

1一阶趋势模型

Dependent Variable: zb

源变量 自由度 平方和 均值 F 概率值

Sum of

Source DF Squares Mean Square F Value Pr > F

Model 2 90615.20993 45307.60497 127.19 <.0001

Error 12 4274.79007 356.23251

Corrected Total

14 94890.00000

R-Square Coeff Var Root MSE zb Mean

0.954950 5.439228 18.87412 347.000

---------------------------------------------------------------------------------------------------------------------------------

Source DF Type I SS Mean Square F Value Pr > F

x 1 89541.56558 89541.56558 251.36 <.0001

y 1 1073.64435 1073.64435 3.01 0.1081

---------------------------------------------------------------------------------------------------------------------------------

Source DF Type III SS Mean Square F Value Pr > F

x 1 14652.24351 14652.24351 41.13 <.0001

y 1 1073.64435 1073.64435 3.01 0.1081

---------------------------------------------------------------------------------------------------------------------------------

Standard

Parameter Estimate Error t Value Pr > |t|

Intercept 64.04999380 33.06539919 1.94 0.0766

x 5.38385565 0.83947567 6.41 <.0001

y 6.94199869 3.99872078 1.74 0.1081

Observation Observed Predicted Residual

1 280.0000000 294.9856503 -14.9856503

2 260.0000000 275.0083707 -15.0083707

3 330.0000000 342.7309246 -12.7309246

4 400.0000000 376.5922015 23.4077985

5 410.0000000 403.5114798 6.4885202

6 270.0000000 255.0310911 14.9689089

7 210.0000000 237.3213811 -27.3213811

8 280.0000000 281.9503694 -1.9503694

9 300.0000000 288.8923681 11.1076319

10 290.0000000 278.1246568 11.8753432

11 410.0000000 376.5922015 33.4077985

12 420.0000000 410.4534785 9.5465215

13 425.0000000 433.5470441 -8.5470441

14 450.0000000 458.1987528 -8.1987528

15 470.0000000 492.0600298 -22.0600298

---------------------------------------------------------------------------------------------------------------------------------

Sum of Residuals -0.000000

Sum of Squared Residuals 4274.790069

Sum of Squared Residuals - Error SS -0.000000

First Order Autocorrelation 0.235461

Durbin-Watson D 1.362704

2二阶趋势模型

Dependent Variable: zb

源变量 自由度 平方和 均值 F 概率值

Sum of

Source DF Squares Mean Square F Value Pr > F

Model 5 93330.83580 18666.16716 107.75 <.0001

Error 9 1559.16420 173.24047

Corrected Total

14 94890.00000

R-Square Coeff Var Root MSE zb Mean

0.983569 3.793108 13.16208 347.0000

--------------------------------------------------------------------------------------------------------------------------------

Source DF Type I SS Mean Square F Value Pr > F

X 1 89541.56558 89541.56558 516.86 <.0001

y 1 1073.64435 1073.64435 6.20 0.0345

x*x 1 1892.86626 1892.86626 10.93 0.0091

x*y 1 772.91658 772.91658 4.46 0.0638

y*y 1 49.84303 49.84303 0.29 0.6047

Source DF Type III SS Mean Square F Value Pr > F

x 1 965.2913631 965.2913631 5.57 0.0426

y 1 127.4395437 127.4395437 0.74 0.4133

x*x 1 43.6622972 43.6622972 0.25 0.6277

x*y 1 242.0343234 242.0343234 1.40 0.2675

y*y 1 49.8430316 49.8430316 0.29 0.6047

Standard

Parameter Estimate Error t Value Pr > |t|

Intercept -262.7664793 109.1074817 -2.41 0.0394

x 16.0699779 6.8078620 2.36 0.0426

y 23.5391327 27.4449867 0.86 0.4133

x*x 0.0638773 0.1272383 0.50 0.6277

x*y -1.1651016 0.9857119 -1.18 0.2675

y*y 1.1673362 2.1762982 0.54 0.6047

---------------------------------------------------------------------------------------------------------------------------------

Observation Observed Predicted Residual

1 280.0000000 279.4168700 0.5831300

2 260.0000000 258.6814596 1.3185404

3 330.0000000 351.0997183 -21.0997183

4 400.0000000 388.1251282 11.8748718

5 410.0000000 414.0657505 -4.0657505

6 270.0000000 255.1256024 14.8743976

7 210.0000000 216.6773768 -6.6773768

8 280.0000000 279.9417834 0.0582166

9 300.0000000 303.5367795 -3.5367795

10 290.0000000 295.5572467 -5.5572467

11 410.0000000 388.1251282 21.8748718

12 420.0000000 419.0280585 0.9719415

13 425.0000000 436.4318573 -11.4318573

14 450.0000000 453.7554706 -3.7554706

15 470.0000000 465.4317699 4.5682301

---------------------------------------------------------------------------------------------------------------------------------

Sum of Residuals -0.000000

Sum of Squared Residuals 1559.164195

Sum of Squared Residuals - Error SS -0.000000

First Order Autocorrelation -0.354205

Durbin-Watson D 2.694808

3三阶趋势模型

Dependent Variable: zb

源变量 自由度 平方和 均值 F 概率值

Sum of

Source DF Squares Mean Square F Value Pr > F

Model 6 93393.46414 15565.57736 83.21 <.0001

Error 8 1496.53586 187.06698

Corrected Total

14 94890.00000

R-Square Coeff Var Root MSE zb Mean

0.984229 3.941569 13.67724 347.0000

Source DF Type I SS Mean Square F Value Pr > F

x 1 89541.56558 89541.56558 478.66 <.0001

y 1 1073.64435 1073.64435 5.74 0.0435

x*x*x 1 2078.77664 2078.77664 11.11 0.0103

x*x*y 1 508.85526 508.85526 2.72 0.1377

x*y*y 1 17.50614 17.50614 0.09 0.7675

y*y*y 1 173.11616 173.11616 0.93 0.3642

---------------------------------------------------------------------------------------------------------------------------------

Source DF Type III SS Mean Square F Value Pr > F

x 1 1643.347081 1643.347081 8.78 0.0180

y 1 197.474017 197.474017 1.06 0.3343

x*x*x 1 105.516422 105.516422 0.56 0.4741

x*x*y 1 113.710330 113.710330 0.61 0.4580

x*y*y 1 146.610010 146.610010 0.78 0.4018

y*y*y 1 173.116161 173.116161 0.93 0.3642

Standard

Parameter Estimate Error t Value Pr > |t|

Intercept -166.0074589 82.37772231 -2.02 0.0786

x 11.1382598 3.75795233 2.96 0.0180

y 15.7784340 15.35703905 1.03 0.3343

x*x*x -0.0154132 0.02052250 -0.75 0.4741

x*x*y 0.1203187 0.15432333 0.78 0.4580

x*y*y -0.3416786 0.38595313 -0.89 0.4018

y*y*y 0.3134894 0.32587614 0.96 0.3642

Observation Observed Predicted Residual

1 280.0000000 281.0906363 -1.0906363

2 260.0000000 256.0483783 3.9516217

3 330.0000000 351.8935219 -21.8935219

4 400.0000000 390.5707896 9.4292104

5 410.0000000 409.2309652 0.7690348

6 270.0000000 257.9983490 12.0016510

7 210.0000000 220.0483966 -10.0483966

8 280.0000000 275.0160368 4.9839632

9 300.0000000 299.4709973 0.5290027

10 290.0000000 295.8228899 -5.8228899

11 410.0000000 390.5707896 19.4292104

12 420.0000000 420.5758580 -0.5758580

13 425.0000000 437.4437284 -12.4437284

14 450.0000000 455.6875798 -5.6875798

15 470.0000000 463.5310833 6.4689167

---------------------------------------------------------------------------------------------------------------------------------

Sum of Residuals -0.000000

Sum of Squared Residuals 1496.535862

Sum of Squared Residuals - Error SS -0.000000

First Order Autocorrelation -0.357545

Durbin-Watson D 2.686333

--------------------------------------------------------------------------------------------------------------------------------

(4) 四阶趋势模型

Dependent Variable: zb

源变量 自由度 平方和 均值 F 概率值

Sum of

Source DF Squares Mean Square F Value Pr > F

Model 11 94480.31919 8589.11993 62.90 0.0029

Error 3 409.68081 136.56027

Corrected Total

14 94890.00000

R-Square Coeff Var Root MSE zb Mean

0.995683 3.367695 11.68590 347.0000

Source DF Type I SS Mean Square F Value Pr > F

x 1 89541.56558 89541.56558 655.69 0.0001

y 1 1073.64435 1073.64435 7.86 0.0676

x*x*x 1 2078.77664 2078.77664 15.22 0.0299

x*x*y 1 508.85526 508.85526 3.73 0.1491

x*y*y 1 17.50614 17.50614 0.13 0.7440

y*y*y 1 173.11616 173.11616 1.27 0.3421

x*x*x*x 1 52.91566 52.91566 0.39 0.5777

x*x*x*y 1 193.81980 193.81980 1.42 0.3192

x*x*y*y 1 452.42798 452.42798 3.31 0.1663

x*y*y*y 1 40.32879 40.32879 0.30 0.6246

y*y*y*y 1 347.36281 347.36281 2.54 0.2090

---------------------------------------------------------------------------------------------------------------------------------

Source DF Type III SS Mean Square F Value Pr > F

x 1 53.8347354 53.8347354 0.39 0.5746

y 1 18.4422458 18.4422458 0.14 0.7376

x*x*x 1 707.3985134 707.3985134 5.18 0.1073

x*x*y 1 688.7276032 688.7276032 5.04 0.1104

x*y*y 1 669.2155979 669.2155979 4.90 0.1137

y*y*y 1 614.9897506 614.9897506 4.50 0.1239

x*x*x*x 1 73.5254957 73.5254957 0.54 0.5162

x*x*x*y 1 21.5720987 21.5720987 0.16 0.7176

x*x*y*y 1 150.8940383 150.8940383 1.10 0.3704

x*y*y*y 1 264.7516451 264.7516451 1.94 0.2581

y*y*y*y 1 347.3628138 347.3628138 2.54 0.2090

Standard

Parameter Estimate Error t Value Pr > |t|

Intercept -748.5352475 602.9093096 -1.24 0.3026

x 21.5268501 34.2855706 0.63 0.5746

y 63.4532525 172.6669316 0.37 0.7376

x*x*x 1.1129083 0.4889782 2.28 0.1073

x*x*y -7.8466442 3.4939960 -2.25 0.1104

x*y*y 17.6919599 7.9919932 2.21 0.1137

y*y*y -12.8173180 6.0398396 -2.12 0.1239

x*x*x*x -0.0052895 0.0072088 -0.73 0.5162

x*x*x*y -0.0339628 0.0854515 -0.40 0.7176

x*x*y*y 0.4218127 0.4012785 1.05 0.3704

x*y*y*y -1.0952733 0.7866207 -1.39 0.2581

y*y*y*y 0.8411079 0.5273783 1.59 0.2090

Observation Observed Predicted Residual

1 280.0000000 280.6428697 -0.6428697

2 260.0000000 254.9148649 5.0851351

3 330.0000000 336.2353148 -6.2353148

4 400.0000000 399.8451524 0.1548476

5 410.0000000 409.0029100 0.9970900

6 270.0000000 265.5623644 4.4376356

7 210.0000000 212.0079405 -2.0079405

8 280.0000000 287.4716063 -7.4716063

9 300.0000000 292.6701245 7.3298755

10 290.0000000 295.8090433 -5.8090433

11 410.0000000 399.8451524 10.1548476

12 420.0000000 428.1747562 -8.1747562

13 425.0000000 422.5228478 2.4771522

14 450.0000000 450.5733972 -0.5733972

15 470.0000000 469.7216557 0.2783443

---------------------------------------------------------------------------------------------------------------------------------

Sum of Residuals 0.0000000

Sum of Squared Residuals 409.6807042

Sum of Squared Residuals - Error SS -0.0001104

First Order Autocorrelation -0.6992027

Durbin-Watson D 3.3972074

---------------------------------------------------------------------------------------------------------------------------------

三、 结果分析

将四种分析结果的主要统计量列于下表:

当概率P值都显著(a=0.05),首先观察概率P值最小者,此处将排除四阶,然后取判定系数较大者,此处选取三阶

显然,三阶的判定系数比二阶要大,不足之处是误差均方根和偏态系数都相对大一些,而且残差独立性检验不大合格。

因此,本数据应采用三阶回归分析,其预测模型如下:

心电图指标(zb)=-166+11.14x+15.78y-0.015x3+0.12x2*y-0.34x*y2+0.313y3

本文来源:https://www.2haoxitong.net/k/doc/feb5f701ad45b307e87101f69e3143323968f5af.html

《(金融保险)金融数据挖掘与应用课程作业.doc》
将本文的Word文档下载到电脑,方便收藏和打印
推荐度:
点击下载文档

文档为doc格式