数据处理的统计学习（scikit-learn教程）

发布时间：2020-12-24 15:05:33 所属栏目：大数据来源：网络整理

导读：副标题#e# 数据挖掘入门与实战 ?公众号： datadw Scikit-learn 是一个紧密结合Python科学计算库(Numpy、Scipy、matplotlib)，集成经典机器学习算法的Python模块。一、统计学习：scikit-learn中的设置与评估函数对象（1）数据集 scikit-learn 从二维数组描

独立成分分析：ICA
独立成分分析（ICA）选择合适的成分使得他们的分布载有最大的独立信息量。可以恢复非高斯独立信号：

# Generate sample datatime = np.linspace(0,10,2000)
s1 = np.sin(2 * time) ?# Signal 1 : sinusoidal signals2 = np.sign(np.sin(3 * time)) ?# Signal 2 : square signalS = np.c_[s1,s2]
S += 0.2 * np.random.normal(size=S.shape) ?# Add noiseS /= S.std(axis=0) ?# Standardize data# Mix dataA = np.array([[1,[0.5,2]]) ?# Mixing matrixX = np.dot(S,A.T) ?# Generate observations# Compute ICAica = decomposition.FastICA()
S_ = ica.fit_transform(X) ?# Get the estimated sourcesA_ = ica.mixing_.T
np.allclose(X,?np.dot(S_,A_) + ica.mean_)

五、联合起来

（1）管道（流水线）

我们已经知道了一些估测器（模型）能够转换数据，一些可以预测变量。我们也能够将其结合到一起：

from sklearn import linear_model,decomposition,datasetsfrom sklearn.pipeline import Pipelinefrom sklearn.grid_search import GridSearchCV
logistic = linear_model.LogisticRegression()
pca = decomposition.PCA()
pipe = Pipeline(steps=[('pca',pca),('logistic',logistic)])
digits = datasets.load_digits()
X_digits = digits.data
y_digits = digits.target################################################################################ Plot the PCA spectrumpca.fit(X_digits)
plt.figure(1,figsize=(4,3))
plt.clf()
plt.axes([.2,.2,.7,.7])
plt.plot(pca.explained_variance_,linewidth=2)
plt.axis('tight')
plt.xlabel('n_components')
plt.ylabel('explained_variance_')################################################################################ Predictionn_components = [20,40,64]
Cs = np.logspace(-4,4,3)
#Parameters of pipelines can be set using ‘__’ separated parameter names:estimator = GridSearchCV(pipe,? ? ? ? ? ? ? ? ? ? ? ? dict(pca__n_components=n_components,? ? ? ? ? ? ? ? ? ? ? ? ? ? ?logistic__C=Cs))
estimator.fit(X_digits,y_digits)
plt.axvline(estimator.best_estimator_.named_steps['pca'].n_components,? ? ? ? ? ?linestyle=':',label='n_components chosen')
plt.legend(prop=dict(size=12))

（2）使用特征联进行人脸识别

? ?该实例使用的数据集是从“Labeled Faces in the Wild”节选预处理得到的。更为熟知的名字是LFW。

http://vis-www.cs.umass.edu/lfw/lfw-funneled.tgz（233 MB）

数据处理的统计学习（scikit-learn教程）

六、寻求帮助

###（1）项目邮件列表
如果你碰到scikit-learn的BUG或者文档中需要澄清声明的部分，请放心大胆的在邮件列表里询问[maillist]

(2）问答（Q&A）机器学习从业者参与的社区

Metaoptimize/QA:
一个机器学习、自然语言处理和其他数据分析方面讨论的论坛（类似针对开发者的Stackoverflow）:http://metaoptimize.com/qa
```
? ? ? ? ? 一个比较容易开始参与的讨论：good freely available textbooks on machine learning（机器学习方面优秀的免费电子书）
```

Quora.com:
Quora 有一个关于机器学习相关的问题主题，也有很多有趣的讨论：http://quora.com/Machine-learning

? ? ? ? ? ? ?浏览一下最佳问题的部分，例如：What are some good resources for learning about machine learning(关于机器学习的优秀资源有哪些)

---斯坦福的 Andrew Ng教授教授的关于机器学习的优秀在线免费课程
{网易公开课有，搜一下机器学习就可以了}
---一个更倾向于人工智能（AI)的优秀在线课程:
http://www.udacity.com/overview/Course/cs271/CourseRev/1
文http://www.cnblogs.com/taceywong/p/4570155.html

数据挖掘入门与实战

搜索添加微信公众号：datadw

教你机器学习，教你数据挖掘

长按图片，识别二维码，点关注

? 公众号： weic2c? ?
据分析入门与实战

长按图片，识别二维码，点关注

（编辑：西安站长网）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!

7/7

首页