Dear All,
1. Why ML
By default machines or computers can't learn. So we can make computers learn with the help of ML algorithms
2. what is a model
A model can learn and answer your questions. Human brain is the best example.
3. What is training data
data used for learning.
4. what is test data
The data used to test the model for finding the efficiency of the model
5. what is labelled data
data with the answer or label
6. What is accuracy
How accurate is the model for test data
eg 90% is good
40% is poor
To calculate the accuracy we need testdata with answer/label/key/tag
7. What are csv files
Comma Separated values which is a portable data format that work across different Operating systems.
That can be opend using excel too
8. Important packages in ML
pandas, numpy, matplotlib
9. use of pandas
Reading and processing csv fiels
10. use of numpy
Advanced numerical computations
11. use of matplotlib
visualization using graphs, charts etc
12. What X and y indicated in ML
X - indicates Data part
y- indicates label part
13. X_train, X_test, y_train, y_test = train_test_split(diabetes.loc[:, diabetes.columns != 'Outcome'], diabetes['Outcome'], stratify=diabetes['Outcome'], random_state=66)
diabetes.loc[:, diabetes.columns != 'Outcome'] - here ':' indicates starting position of data part which is 'first column', and diabetes.columns != 'Outcome' indicates ending position of data part which is just before Outcome column. - X
diabetes['Outcome'] - indicates the label part which is y
random_state=66 indicates the percentage of Training data. Test data will e 34% of the total data
14. from sklearn.neighbors import KNeighborsClassifier
sklearn is the package which contains ML algorithms
Here KNeighborsClassifier is imported from neighbors subpackage of sklearn
15. knn = KNeighborsClassifier(n_neighbors=9)
Create a variable corresponding to the algorithm KNeighborsClassifier with model parameter n_neighbors with value 9
16. What is model parameter
For each algorithm there can be some parameters or settings
17.knn.fit(X_train, y_train)
Training the model with data (X_train) and label/answer(y_train)
18.knn.score(X_test, y_test)
This is the testing process for calculating the score/accuracy of the model
step1 - Testing the model with unknown data X_test
step2- the result obtained from step1 is y_pred
step3- compare y_pred with y_test
19. why we calculate accuracy
we are calculating accuracy of a model for its deployment/applicability.
if we have decent accuracy then we can apply in real world applications.
20. deploy
new_data=[[4,111,92,0,0,36.6,0.190,31]]
ans=knn.predict(new_data)
Here unknown data is stored in variable 'new_data'.
'predict()' function test the data with the model and result is given in variable 'ans'
21. Full cycle
1. Download data and upload to colab
2. Split the data to train and test
3. Create a model variable
4. Train the model with training data
5. Find accuracy of the model using score function.
6. Deploy the model with unknown data using predict function and show the answer to user.
No comments:
Post a Comment