Python Basics for Data Science MCQ Questions and Answers Part – 1
Python Basics for Data Science MCQ Questions and Answers Part – 2
Section 6: Advanced Machine Learning (51-70)
- What is the purpose of
train_test_split()
in Scikit-learn?
- a) Splitting DataFrames
- b) Splitting a dataset into training and testing sets
- c) Filtering missing values
- d) Performing PCA
Answer: b) Splitting a dataset into training and testing sets
- What is the main advantage of using Principal Component Analysis (PCA)?
- a) Increases data dimensionality
- b) Reduces dimensionality while preserving variance
- c) Improves feature importance
- d) Converts numerical data to categorical
Answer: b) Reduces dimensionality while preserving variance
- Which function is used to encode categorical variables in Scikit-learn?
- a)
StandardScaler()
- b)
OneHotEncoder()
- c)
MinMaxScaler()
- d)
LabelBinarizer()
Answer: b)OneHotEncoder()
- What is the purpose of
cross_val_score()
in Scikit-learn?
- a) Finds the best hyperparameters
- b) Performs cross-validation
- c) Trains a deep learning model
- d) Computes classification accuracy
Answer: b) Performs cross-validation
- Which ensemble method combines weak learners to form a strong learner?
- a) Boosting
- b) PCA
- c) K-Means
- d) Cross-validation
Answer: a) Boosting
- What is an imbalanced dataset?
- a) A dataset with missing values
- b) A dataset where one class has significantly more samples than another
- c) A dataset without categorical variables
- d) A dataset with equal class distribution
Answer: b) A dataset where one class has significantly more samples than another
- Which method is used to handle imbalanced datasets?
- a) Feature scaling
- b) SMOTE (Synthetic Minority Over-sampling Technique)
- c) PCA
- d) Cross-validation
Answer: b) SMOTE (Synthetic Minority Over-sampling Technique)
- Which boosting algorithm is widely used for structured data?
- a) KNN
- b) XGBoost
- c) Logistic Regression
- d) Linear Regression
Answer: b) XGBoost
- What does hyperparameter tuning do?
- a) Reduces dataset size
- b) Optimizes model performance
- c) Increases test accuracy
- d) Improves data quality
Answer: b) Optimizes model performance
- Which function is used to find the best hyperparameters in Scikit-learn?
- a)
cross_val_score()
- b)
GridSearchCV()
- c)
train_test_split()
- d)
StandardScaler()
Answer: b)GridSearchCV()
- What is feature engineering?
- a) A machine learning algorithm
- b) Creating new features from raw data
- c) Training a deep learning model
- d) Optimizing hyperparameters
Answer: b) Creating new features from raw data
- Which technique helps reduce overfitting in decision trees?
- a) Increasing depth
- b) Pruning
- c) Adding more features
- d) Reducing dataset size
Answer: b) Pruning
- What is bagging in ensemble learning?
- a) Reducing dimensionality
- b) Training multiple models on random subsets of data
- c) Applying multiple activation functions
- d) Combining similar features
Answer: b) Training multiple models on random subsets of data
- What is the primary goal of k-means clustering?
- a) Group similar data points together
- b) Perform feature scaling
- c) Increase accuracy of classification models
- d) Reduce data redundancy
Answer: a) Group similar data points together
- What metric is commonly used to evaluate clustering performance?
- a) R-squared
- b) Silhouette Score
- c) F1-score
- d) Accuracy
Answer: b) Silhouette Score
- What is the primary function of a confusion matrix?
- a) Evaluate classification model performance
- b) Improve model accuracy
- c) Transform data features
- d) Optimize neural networks
Answer: a) Evaluate classification model performance
- Which metric is used to evaluate a regression model?
- a) Accuracy
- b) Mean Squared Error (MSE)
- c) F1-score
- d) Confusion Matrix
Answer: b) Mean Squared Error (MSE)
- What is the key advantage of using a Random Forest over a single Decision Tree?
- a) Requires less data
- b) Reduces overfitting
- c) Runs faster
- d) Requires no hyperparameter tuning
Answer: b) Reduces overfitting
- What is the main function of a neural network activation function?
- a) Introduce non-linearity
- b) Reduce loss
- c) Normalize input values
- d) Increase model size
Answer: a) Introduce non-linearity
- What is dropout used for in deep learning?
- a) Increase training time
- b) Improve feature selection
- c) Prevent overfitting
- d) Reduce activation function complexity
Answer: c) Prevent overfitting
Section 7: Deep Learning (71-85)
- What is the fundamental unit of a neural network?
- a) Batch size
- b) Neuron
- c) Feature selection
- d) Gradient descent
Answer: b) Neuron
- What type of activation function is commonly used in hidden layers?
- a) Sigmoid
- b) Softmax
- c) ReLU
- d) Linear
Answer: c) ReLU
- What is the main function of an optimizer in deep learning?
- a) Improve model visualization
- b) Minimize the loss function
- c) Increase learning rate
- d) Reduce dataset size
Answer: b) Minimize the loss function
- What is the role of a convolutional layer in a CNN?
- a) Process textual data
- b) Detect spatial patterns in images
- c) Normalize feature values
- d) Remove overfitting
Answer: b) Detect spatial patterns in images
- What is the primary difference between CNN and RNN?
- a) CNN is for text, RNN is for images
- b) CNN is for images, RNN is for sequential data
- c) CNN is supervised, RNN is unsupervised
- d) CNN uses more parameters
Answer: b) CNN is for images, RNN is for sequential data
- What is the main challenge of training deep neural networks?
- a) Lack of training data
- b) Vanishing gradient problem
- c) High training speed
- d) Small dataset sizes
Answer: b) Vanishing gradient problem
- What is the primary function of dropout in neural networks?
- a) Improve accuracy
- b) Reduce overfitting
- c) Increase training speed
- d) Improve data visualization
Answer: b) Reduce overfitting
- Which optimizer is commonly used in deep learning?
- a) SGD
- b) Adam
- c) Linear Regression
- d) XGBoost
Answer: b) Adam
- What is the primary role of a fully connected layer in a neural network?
- a) Reduce overfitting
- b) Combine learned features for final classification
- c) Normalize input data
- d) Generate new data points
Answer: b) Combine learned features for final classification
- What is transfer learning in deep learning?
- a) Using different datasets for training
- b) Training a model from scratch
- c) Reusing a pre-trained model for a different task
- d) Reducing training time using feature selection
Answer: c) Reusing a pre-trained model for a different task
- Which deep learning framework is widely used for NLP and vision tasks?
- a) Scikit-learn
- b) OpenCV
- c) TensorFlow
- d) NumPy
Answer: c) TensorFlow
- What is batch normalization used for in deep learning?
- a) Improve visualization
- b) Reduce model complexity
- c) Stabilize and accelerate training
- d) Improve data augmentation
Answer: c) Stabilize and accelerate training
- What is an epoch in deep learning?
- a) A new layer in a neural network
- b) One complete pass through the entire training dataset
- c) A type of activation function
- d) A convolution operation
Answer: b) One complete pass through the entire training dataset
- What is the primary advantage of using an autoencoder?
- a) Increase model accuracy
- b) Perform unsupervised feature learning
- c) Improve dataset size
- d) Reduce overfitting
Answer: b) Perform unsupervised feature learning
- What is reinforcement learning primarily used for?
- a) Text processing
- b) Image classification
- c) Decision-making in dynamic environments
- d) Data augmentation
Answer: c) Decision-making in dynamic environments
Section 8: Natural Language Processing (86-100)
- What is Tokenization in NLP?
- a) Splitting text into words or sentences
- b) Removing stopwords
- c) Stemming words
- d) Applying TF-IDF
Answer: a) Splitting text into words or sentences
- What does TF-IDF stand for?
- a) Term First-In Document Frequency
- b) Text Feature In-depth
- c) Term Frequency – Inverse Document Frequency
- d) Token Feature Index
Answer: c) Term Frequency – Inverse Document Frequency
- What is the purpose of Word2Vec in NLP?
- a) Convert text into images
- b) Represent words as numerical vectors
- c) Classify documents
- d) Remove stopwords
Answer: b) Represent words as numerical vectors
- Which model is commonly used in NLP for text generation?
- a) CNN
- b) Transformer
- c) K-Means
- d) Decision Tree
Answer: b) Transformer
- What is the difference between stemming and lemmatization?
- a) No difference
- b) Stemming cuts words, lemmatization gives root words
- c) Lemmatization uses AI
- d) Stemming is better
Answer: b) Stemming cuts words, lemmatization gives root words
- Which deep learning model is widely used in NLP tasks?
- a) Decision Trees
- b) BERT (Bidirectional Encoder Representations from Transformers)
- c) SVM
- d) Random Forest
Answer: b) BERT
- What is the primary use of Named Entity Recognition (NER)?
- a) Detecting sentiment
- b) Identifying names, locations, and entities in text
- c) Tokenizing sentences
- d) Predicting next words in a sentence
Answer: b) Identifying names, locations, and entities in text
- Which NLP technique is used to summarize text?
- a) Named Entity Recognition
- b) Text Summarization
- c) Sentiment Analysis
- d) Topic Modeling
Answer: b) Text Summarization
- What is the main goal of sentiment analysis?
- a) Classify emails
- b) Determine the emotional tone of a text
- c) Translate text
- d) Convert speech to text
Answer: b) Determine the emotional tone of a text
- Which of the following is a popular NLP dataset?
- a) CIFAR-10
- b) IMDB Reviews
- c) MNIST
- d) COCO
Answer: b) IMDB Reviews
- What is BLEU score used for in NLP?
- a) Evaluate machine translation quality
- b) Measure topic coherence
- c) Detect entities in text
- d) Perform text clustering
Answer: a) Evaluate machine translation quality
- Which language model is used in OpenAI’s ChatGPT?
- a) CNN
- b) GPT (Generative Pre-trained Transformer)
- c) RNN
- d) LSTM
Answer: b) GPT
- What is Zero-shot learning in NLP?
- a) Making predictions without prior training on a specific task
- b) Removing stopwords
- c) Sentiment analysis
- d) Topic modeling
Answer: a) Making predictions without prior training on a specific task
- Which framework is used for NLP in Python?
- a) NLTK
- b) OpenCV
- c) TensorFlow
- d) PyTorch
Answer: a) NLTK
- What is POS tagging in NLP?
- a) Identifying parts of speech in text
- b) Detecting named entities
- c) Summarizing text
- d) Translating text
Answer: a) Identifying parts of speech in text