Wednesday, 11 January 2023

Machine Learning Viva questions

 Dear All,

1. Why ML

By default machines or computers can't learn. So we can make computers learn with the help of ML algorithms

2. what is a model

A model can learn and answer your questions. Human brain is the best example. 


3. What is training data

data used for learning.

4. what is test data

The data used to test the model for finding the efficiency of the model

5. what is labelled data

data with the answer or label

6. What is accuracy

How accurate is the model for test data

eg 90% is good

40% is poor

To calculate the accuracy we need testdata with answer/label/key/tag


7. What are csv files

Comma Separated values which is a portable data format that work across different Operating systems.

That can be opend using excel too


8. Important packages in ML

pandas, numpy, matplotlib


9. use of pandas

Reading and processing csv fiels


10. use of numpy

Advanced numerical computations 


11. use of matplotlib

visualization using graphs, charts etc


12. What X and y indicated in ML


X - indicates Data part

y- indicates label part


13. X_train, X_test, y_train, y_test = train_test_split(diabetes.loc[:, diabetes.columns != 'Outcome'], diabetes['Outcome'], stratify=diabetes['Outcome'], random_state=66)


diabetes.loc[:, diabetes.columns != 'Outcome'] - here ':' indicates starting position of data part which is 'first column', and  diabetes.columns != 'Outcome' indicates ending position of data part which is just before Outcome column. - X


diabetes['Outcome'] - indicates the label part which is y


random_state=66 indicates the percentage of Training data. Test data will e 34% of the total data


14. from sklearn.neighbors import KNeighborsClassifier


sklearn is the package which contains ML algorithms

Here KNeighborsClassifier is imported from neighbors subpackage of sklearn


15. knn = KNeighborsClassifier(n_neighbors=9)

Create a variable corresponding to the algorithm KNeighborsClassifier with model parameter n_neighbors with value 9


16. What is model parameter

For each algorithm there can be some parameters or settings


17.knn.fit(X_train, y_train)

Training the model with data (X_train) and label/answer(y_train)


18.knn.score(X_test, y_test)


This is the testing process for calculating the score/accuracy of the model

step1 - Testing the model with unknown data X_test

step2- the result obtained from step1 is y_pred

step3- compare y_pred with y_test 


19. why we calculate accuracy

we are calculating accuracy of a model for its deployment/applicability.

if we have decent accuracy then we can apply in real world applications.


20. deploy

new_data=[[4,111,92,0,0,36.6,0.190,31]]

ans=knn.predict(new_data)


Here unknown data is stored in variable 'new_data'.

'predict()' function test the data with the model and result is given in variable 'ans'


21. Full cycle


1. Download data and upload to colab

2. Split the data to train and test

3. Create a model variable

4. Train the model with training data

5. Find accuracy of the model using score function.

6. Deploy the model with unknown data using predict function and show the answer to user.










No comments:

Post a Comment