DEPARTMENT OF INFORMATION TECHNOLOGY
Data Warehousing and Data Mining
Sub Code : CS2032
Sub Name: Data Warehousing and Data Mining
1. Define the term ‘Data Warehouse’.
2. Write down the applications of data warehousing.
3. When is data mart appropriate?
4. List out the functionality of metadata.
5. What are nine decision in the design of a Data warehousing?
6. List out the two different types of reporting tools.
7. Why data mining is used in all organizations.
8. What are the technical issues to be considered when designing and implementing a data warehouse environment?
9. List out some of the examples of access tools.
10. What are the advantages of data warehousing.
11. Give the difference between the Horizontal and Vertical Parallelism.
12. Draw a neat diagram for the Distributed memory shared disk architecture.
13. Define star schema.
14. What are the reasons to achieve very good performance by SYBASE IQ technology?
15. What are the steps to be followed to store the external source into the data warehouse?
16. Define Legacy data.
17. Draw the standard framework for metadata interchange.
18. List out the five main groups of access tools.
19. Define Data Visualization.
20. What are the various forms of data preprocessing?
21. How is data warehouse different from database? How are they similar?
22. What is data transformation? Give example.
23. With an example explain what is Meta data?
24. What is data mart?
1. Enumerate the building blocks of data warehouse. Explain the importance of metadata in a data warehouse environment. 
2. Explain various methods of data cleaning in detail 
3. Diagrammatically illustrate and discuss the data warehousing architecture with briefly explain components of data warehouse 
4. (i) Distinguish between Data warehousing and data mining.  (ii)Describe in detail about data extraction, cleanup 
5. Write short notes on
(i)Transformation  (ii)Metadata 
6. List and discuss the steps involved in mapping the data warehouse to a multiprocessor architecture. 
7. Discuss in detail about Bitmapped Indexing 
8. Explain in detail about different Vendor Solutions. 
1. Difference between OLAP and OLTP.
2. Classify OLAP tools.
3. What is meant by OLAP?
4. Difference between OLAP & OLTP
5. Define Concept Hierarchy.
6. List out the five categories of decision support tools.
7. Define Cognos Impromptu
8. List out any 5 OLAP guidelines.
9. Distinguish between multidimensional and multi-relational OLAP.
10. Define ROLAP.
11. Draw a neat diagram for the web processing model.
12. Define MQE.
13. Draw a neat sketch for three-tired client/server architecture.
14. List out the applications that the organizations uses to build a query and reporting environment for the data warehouse.
15. Distinguish between window painter and data windows painter.
16. Define ADF, SGF and DEF.
17. What is the function of power play administrator?
1. Discuss the typical OLAP operations with an example. 
2. List and discuss the basic features that are provided by reporting
and query tools used for business analysis. 
3. Describe in detail about Cognos Impromptu 
4. Explain about OLAP in detail. 
5. With relevant examples discuss multidimensional online analytical processing and multi-relational online analytical processing. 
6. Discuss about the OLAP tools and the Internet 
7. (i)Explain Multidimensional Data model. 
(ii)Discuss how computations can be performed efficiently on data cubes. 
1. Define data.
2. State why the data preprocessing an important issue for data warehousing and data mining.
3. What is the need for discretization in data mining?.
4. What are the various forms of data preprocessing?
5. What is concept Hierarchy? Give an example.
6. What are the various forms of data preprocessing?
7. Mention the various tasks to be accomplished as part of data pre-processing.
8. Define Data Mining.
9. List out any four data mining tools.
10. What do data mining functionalities include?
11. Define patterns.
(i) Explain the various primitives for specifying Data mining Task.
(ii) Describe the various descriptive statistical measures for data mining.
Discuss about different types of data and functionalities.
(i)Describe in detail about Interestingness of patterns.
(ii)Explain in detail about data mining task primitives.
(i)Discuss about different Issues of data mining.
(ii)Explain in detail about data preprocessing.
How data mining system are classified? Discuss each classification with an example.
How data mining system can be integrated with a data warehouse? Discuss with an example.
ASSOCIATION RULE AND CLASSIFICATION
1. What is meant by market Basket analysis?
2. What is the use of multilevel association rules?
3. What is meant by pruning in a decision tree induction?
4. Write the two measures of Association Rule.
5. With an example explain correlation analysis.
6. Define conditional pattern base.
7. List out the major strength of decision tree method.
8. In classification trees, what are the surrogate splits, and how are they used?
9. The Naïve Bayes’ classifier makes what assumptions that motivate its name?
10. What is the frequent item set property?
11. List out the major strength of the decision tree Induction.
12. Write the two measures of association rule.
13. How are association rules mined from large databases?
14. What is tree pruning in decision tree induction?
15. What is the use of multi level association rules?
16. What are the Apriori properties used in the Apriori algorithms?
17. How is predication different from classification?
18. What is a support vector machine?
19. What are the means to improve the performance of association rule mining algorithm?
20. State the advantages of the decision tree approach over other approaches for performing classification.
1. Decision tree induction is a popular classification method. Taking one typical decision tree induction algorithm , briefly outline the method of decision tree classification. 
2. Consider the following training dataset and the original decision tree induction algorithm (ID3). Risk is the class label attribute. The Height values have been already discredited into disjoint ranges. Calculate the information gain if Gender is chosen as the test attribute. Calculate the information gain if Height is chosen as the test attribute. Draw the final decision tree (without any pruning) for the training dataset. Generate all the “IF-THEN rules from the decision tree.
Gender Height Risk
F (1.5, 1.6) Low
M (1.9, 2.0) High
F (1.8, 1.9) Medium F (1.8, 1.9) Medium F (1.6, 1.7) Low
M (1.8, 1.9) Medium
F (1.5, 1.6) Low M (1.6, 1.7) Low M (2.0, 8) High M (2.0, 8) High
F (1.7, 1.8) Medium M (1.9, 2.0) Medium F (1.8, 1.9) Medium F (1.7, 1.8) Medium
F (1.7, 1.8) Medium 
(a) Given the following transactional database
1 C, B, H
2 B, F, S
3 A, F, G
4 C, B, H
5 B, F, G
6 B, E, O
(i) We want to mine all the frequent itemsets in the data using the Apriori algorithm.
Assume the minimum support level is 30%. (You need to give the set of frequent item sets in L1, L2,… candidate item sets in C1, C2,…) 
(ii) Find all the association rules that involve only B, C.H (in either left or right hand side of the rule). The minimum confidence is 70%. 
3. Describe the multi-dimensional association rule, giving a suitable example. 
4. (a)Explain the algorithm for constructing a decision tree from training samples 
(b)Explain Bayes theorem. 
6. Develop an algorithm for classification using Bayesian classification. Illustrate the algorithm with a relevant example. 
7. Discuss the approaches for mining multi level association rules from the transactional databases. Give relevant example. 
8. Write and explain the algorithm for mining frequent item sets without candidate generation. Give relevant example. 
9. How is attribute oriented induction implemented? Explain in detail. 
10. Discuss in detail about Bayesian classification 
11. A database has four transactions. Let min sup=60% and min conf=80%.
Find all frequent itemsets using Apriori and FP growth, respectively. Compare the efficiency of the two mining process. 
CLUSTERING AND APPLICATION AND TRENDS IN DATA
1. What are the requirements of clustering?
2. What are the applications of spatial data bases?
3. What is text mining?
4. Distinguish between classification and clustering.
5. Define a Spatial database.
6. List out any two various commercial data mining tools.
7. What is the objective function of K-means algorithm?
8. Mention the advantages of Hierarchical clustering.
9. Distinguish between classification and clustering.
10. List the requirements of clustering in data mining.
11. What is web usage mining?
12. What are the requirements of clustering?
13. What are the applications of spatial databases?
14. What is text mining?
15. What is cluster analysis ?
16. What are the two data structures in cluster analysis?
17. What is an outlier? Give example.
18. What is audio data mining?
19. List two application of data mining.
1. BIRCH and CLARANS are two interesting clustering algorithms that perform effective clustering in large data sets.
(i) Outline how BIRCH performs clustering in large data sets.  (ii) Compare and outline the major differences of the two scalable clustering algorithms BIRCH and CLARANS. 
2. Write a short note on web mining taxonomy. Explain the different activities of text mining.
3. Discuss and elaborate the current trends in data mining. [6+5+5]
4. Discuss spatial data bases and Text databases 
5. What is a multimedia database? Explain the methods of mining multimedia database? 
6. (a) Explain the following clustering methods in detail.
(a) BIRCH (b) CURE 
7. Discuss in detail about any four data mining applications. 
8. Write short notes on
(i) Partitioning methods  (ii) Outlier analysis 
9. Describe K means clustering with an example. 
10. Describe in detail about Hierarchical methods.
11. With relevant example discuss constraint based cluster analysis.