Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
19 Cards in this Set
- Front
- Back
Data Mining and Seven Steps
|
The process of discovering interesting patterns from big data. It involves data cleaning, data integration, data selection,. data transformation, pattern discover, pattern evaluation, and knowledge presentation
|
|
When is a pattern considered interesting?
|
If it is valid on test data with some degree of certainty with novel potential use, and can be easily understood by humans. This is called Knowledge discovery.
|
|
The major 4 Dimensions of data mining.
|
Data, knowledge, technology, applications
|
|
Data Warehouse
|
A repository for long-term storage of data from multiple sources, organized to leverage decision making. It has a unified schema. Allow for Online Analytical Processing
|
|
Multidimensional Data Mining
|
Sometimes called exploratory multidimensional data mining, it integrates core data mining techniques with OLAP-based Multidimensional Analysis
|
|
Data Mining Applications
|
Business intelligence, Web searching and Page ranking, Bioinformatics, Intrusion Detection - Cyber security, Fraud Detection
|
|
Data Mining Functionalities
|
Specific types of patterns or knowledge that can be found in a data mining task.
|
|
List of Functionalities of Data mining.
|
associations, correlation classification and regression,. cluster analysis, outlier detection
|
|
Data mining is interdisplinary, What are some of the different domains of data mining
|
Statistics, Machine learning, database and data warehouse systems, information retrieval.
|
|
4 Difference challenges in Data Mining.
|
Efficiency, scalability, diverse data types, mining methodology.
|
|
Define 6 aspects of data quality.
|
Data Quality is defined in terms of accuracy, completeness, consistency, timeliness, believability, and interpretability.
|
|
Data Cleaning
|
attempts to fill missing values, smooth out noise, identify outliers, and correct inconsistencies in data. Error Detection and Data transformation |
|
Data Integration
|
Combines data from many sources to be used in a data warehouse.
|
|
Data Reduction
|
Techniques obtain a reduction representation of the data while minimizing the loss of information content. Includes, (Dimension Reduction, Numerosity Reduction and Data Compression) |
|
Data Transformation
|
Converts data into the appropriate format for mining. Includes Data Discretization, Normalization and concept hierarchy generation
|
|
Data Discretization
|
Transforms numeric data by mapping values to intervals or concept labels, Includes Binning, Histogram analysis, cluster analysis, decision tree analysis, and correlation analysis
|
|
What is concept hierarchy generation
|
assigning values to attribes
|
|
What is Nominal Data
|
Labels variables but doesn't have any quantitative value Ex: Brown = 2
|
|
What is Discrete Data
|
Based of counts are typical real numbers or integers. finite numers
|