Olete.in – MCQs, Mock Tests & Government Job Prep| Bigdata Introduction to Bigdata Mcq Set 2

1. The model will be trained with data in one single batch is known as ?
Batch learning
Offline learning
Both A and B
None of the above

2. In Model based learning methods, an iterative process takes place on the ML models that are built based on various model parameters, called ?
mini-batches
optimizedparameters
hyperparameters
superparameters

3. Which of the following is a widely used and effective machine learningalgorithm based on the idea of bagging?
Decision Tree
Regression
Classification
Random Forest

4. Which of the following is a disadvantage of decision trees?
Factor analysis
Decision trees are robust to outliers
Decision trees are prone to be overfit
None of the above

5. How do you handle missing or corrupted data in a dataset?
Drop missing rows or columns
Replace missing values with mean/median/mode
. Assign a unique category to missing values
All of the above

6. When performing regression or classification, which of the following is thecorrect way to preprocess the data
Normalize the data -&gt; PCA -&gt; training
PCA -&gt; normalize PCA output -&gt; training
Normalize the data -&gt; PCA -&gt; normalize PCA output -&gt; training
None of the above

7. Which of the following statements about regularization is not correct?
Using too large a value of lambda can cause your hypothesis to underfit the data.
Using too large a value of lambda can cause your hypothesis to overfit the data
Using a very large value of lambda cannot hurt the performance of your hypothesis.
None of the above

8. What is a sentence parser typically used for?
It is used to parse sentences to check if they are utf-8 compliant.
It is used to parse sentences to derive their most likely syntax tree structures.
It is used to parse sentences to assign POS tags to all tokens.
It is used to check if sentences can be parsed into meaningful tokens.

9. To find the minimum or the maximum of a function, we set the gradient tozero because:
The value of the gradient at extrema of a function is always zero
Depends on the type of problem
Both A and B
None of the above

10. Which of the following techniques can not be used for normalization intext mining?
Stemming
Lemmatization
Stop Word Removal
None of the above

11. In which of the following cases will K-means clustering fail to give good results?1) Data points with outliers 2) Data points with different densities 3) Data points with nonconvex shapes
1 and 2
2 and 3
1 and 3
All of the above

12. Data Analysis is a process of?
inspecting data
cleaning data
transforming data
All of the above

13. Which of the following is not a major data analysis approaches?
Data Mining
Predictive Intelligence
Business Intelligence
Text Analytics

14. How many main statistical methodologies are used in data analysis?
2
3
4
5

15. In descriptive statistics, data from the entire population or a sample issummarized with ?
integer descriptors
floating descriptors
numerical descriptors
decimal descriptors

16. Data Analysis is defined by the statistician?
William S.
Hans Peter Luhn
Gregory Piatetsky-Shapiro
John Tukey

17. Which of the following is true about hypothesis testing?
William S.
Hans Peter Luhn
Gregory Piatetsky-Shapiro
John Tukey

18. Which of the following is true about hypothesis testing?
answering yes/no questions about the data
estimating numerical characteristics of the data
describing associations within the data
modeling relationships within the data

19. The goal of business intelligence is to allow easy interpretation of largevolumes of data to identify new opportunities.
True
False
Can be true or false
Can not say

20. The branch of statistics which deals with development of particularstatistical methods is classified as
industry statistics
economic statistics
applied statistics
applied statistics

21. Which of the following is true about regression analysis?
answering yes/no questions about the data
estimating numerical characteristics of the data
modeling relationships within the data
describing associations within the data

22. Text Analytics, also referred to as Text Mining?
True
False
Can be true or false
Can not say

23. What is true about Data Visualization?
Data Visualization is used to communicate information clearly and efficiently to users by the usage of information graphics such as tables and charts.
Data Visualization helps users in analyzing a large amount of data in a simpler way.
Data Visualization makes complex data more accessible, understandable, and usable.
All of the above

24. Data can be visualized using?
graphs
charts
maps
All of the above

25. Data visualization is also an element of the broader _____________.
deliver presentation architecture
data presentation architecture
dataset presentation architecture
data process architecture

26. Which method shows hierarchical data in a nested format?
Treemaps
Scatter plots
Population pyramids
Area charts

27. Which is used to inference for 1 proportion using normal approx?
fisher.test()
chisq.test()
Lm.test()
prop.test()

28. Which is used to find the factor congruence coefficients?
factor.mosaicplot
factor.xyplot
factor.congruence
factor.cumsum

29. Which of the following is tool for checking normality?
qqline()
qline()
anova()
lm()

30. Which of the following is false?
data visualization include the ability to absorb information quickly
Data visualization is another form of visual art
Data visualization decrease the insights and take solwer decisions
None Of the above

31. Common use cases for data visualization include?
Politics
Sales and marketing
Healthcare
All of the above

32. Which of the following plots are often used for checking randomness intime series?
Autocausation
Autorank
Autocorrelation
None of the above

33. To find the minimum or the maximum of a function, we set the gradient to zero because:
The value of the gradient at extrema of a function is always zero
Depends on the type of problem
Both A and B
None of the above

34. Which of the following techniques can not be used for normalization in text mining?
Stemming
Lemmatization
Stop Word Removal
None of the above

35. In which of the following cases will K-means clustering fail to give goodresults? 1) Data points with outliers 2) Data points with different densities 3) Data points with nonconvex shapes
1 and 2
2 and 3
1 and 3
All of the above

36. Which of the following is a reasonable way to select the number ofprincipal components "k"?
Choose k to be the smallest value so that at least 99% of the varinace is retained.
Choose k to be 99% of m (k = 0.99*m, rounded to the nearest integer).
Choose k to be the largest value so that 99% of the variance is retained.
Use the elbow method.

37. Which of the following is false?
Subsetting can be used to select and exclude variables and observations
Raw data should be processed only one time.
Merging concerns combining datasets on the same observations to produce a result with more variables
None Of the above

38. According to analysts, for what can traditional IT systems provide a foundation when they’re integrated with big data technologies like Hadoop?
Big data management and data mining
Data warehousing and business intelligence
Management of Hadoop clusters
Collecting and storing unstructured data

39. ________ Programming language is dialect of S.
B
C
R
None of the above

40. File containing R scripts end with extension _______.
R
S
bigdata
All of the above

41. Which of the following is a subset of machine learning?
Numpy
SciPy
Deep Learning
All of the above

42. How many layers Deep learning algorithms are constructed?
2
3
4
5