Sumários
Introduction to data mining
17 Maio 2022, 13:00 • António Ferreira
Introduction to data mining: overview; infrastructure; typical tasks; how it is being used; virtuous cycle; classification with decision trees and neural networks; hierarchical agglomerative and k-means clustering; association rules; evaluation of classification models; difficulties in using data mining.
ETL system
10 Maio 2022, 13:00 • António Ferreira
ETL system: staging steps of a data warehouse; conceptual ETL plan; logical data map; ETL build sequence; metadata; flat files vs. databases; data quality screens; conforming data; loading data into dimensions; handling SCD type 2 changes; loading data into facts; loading snapshot fact tables; indexes during the ETL processing; outwitting the database log; increasing ETL throughput.
Physical design of data warehouses
3 Maio 2022, 13:00 • António Ferreira
Physical design of data warehouses: motivation; tree-based, hash-based, and bitmap indexes; clustered and multi-attribute indexes; materialized views; data compression; data partitioning; distributed storage.