Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
216 Cards in this Set
- Front
- Back
Give examples of which business functions BI can be used in |
Marketing and sales Supply Chain Finance IT |
|
Give examples of exemplary usage scenario of BI in Marketing and Sales |
- Customer Journey Analysis: analysis of customers social, mobil and locational data - More accurately attribute sales to advertising campaigns -> prioritise marketing spending - Analyse accuracy of salespeople's predictions - Use smartphone and car location deviced to monitor how salespeople actually spend their time |
|
Give examples of exemplary usage scenario of BI in Supply chain |
- RFID, GPS, ILS sensors (Identification, location, condition) sensors can monitor the condition of goods in the supply chain: light, temperature, g-forces |
|
Give examples of exemplary usage scenario of BI in finance |
- Quantify risks of investment decisions - Identify buying or selling opportunities - Detect fraud and money laundering |
|
Give examples of exemplary usage scenario of BI in IT |
- Monitor reliability of IT operations - Predict where and from whom security threats will emerge |
|
Describe the evolution of BI&A (year and name) |
- 1960: MIS (Management Information Systems) - 1970: DSS (Decision Support Systems) - 1980: EIS (Executive Information Systems) - 1990: DWH (Data Warehousing) - 2000: BI&A (Business Intelligence & Analytics) - 2010-: Big Data (Analytics) |
|
Describe the characteristics of MIS. |
- Efficient data processing - Integrated systems - Vision of automatic decision support |
|
Describe the characteristics of DSS. |
- Statistical algorithms - What-if analysis - Database centricity - Hardwired |
|
Describe the characteristics of EIS. |
- Multi-dimensional modelling - Dedicated systems separated from operational systems - Focus on top management |
|
Describe the characteristics of DWH. |
- Integration of multiple and heterogeneous sources - OLAP analysis - Data history |
|
Describe the characteristics of BI&A |
- Reporting to the masses - Advanced BI front ends - Real-time access - Planning capabilities - Closed loop performance management |
|
Describe the characteristics of Big Data (Analytics) |
- Very large amounts of data - Structured and unstructured data - Advanced analytics |
|
Give both definitions of Decision Making and their author. |
- Decision Making is the process of choosing among two or more alternative courses of action for the purpose of attaining a goal or goals (Turban et al. 2008) - Decision making is the process of sufficiently reducing uncertainty and doubt about alternatives to allow a reasonable choice to be made from among them (Harris, 2012) |
|
Describe the decision making process. |
1. Intelligence: search for conditions that call for decision. 2. Design: invent, develop and analyse possible alternative courses of action (solutions) 3. Choice: select a course of action from among those available 4. Implementation: adapt the selected course of action to the decision situation (i.e. problem solving or opportunity exploring) |
|
What is the definition of Decision support system? |
A decision support system (DSS) isa computer-based information system that supports decision-making activities. |
|
Why use computerized decision support systems? |
- Speedy computation - Improved communication and collaboration - Increased productivity of group members - Improved data management - Quality support - Agility support - Overcoming cognitive limits in processing and storing information |
|
What is the definition of Business Intelligence and Analytics? |
BI&A refers to the techniques, technologies, systems, practices, methodologies and applications that analyze critical business data to help an enterprise better understand its business and market and make timely business decisions. |
|
What are the key characteristics of BI&A 1.0? |
DBMS-based, structured content - RDBMS & data warehousing - ETL & OLAP - Dashboards & scoreboards - Data mining & statistical analysis |
|
What are the key characteristics of BI&A 2.0? |
Web-based, unstructured content - Information retrieval and extraction - Opinion mining - Question answering - Web analytics and web intelligence - Social media analytics - Social network analysis - Spatial-temporal analysis |
|
What are the key characteristics of BI&A 3.0? |
Mobile and sensor-based content - Location-aware analysis - Person-centered analysis - Context-relevant analysis - Mobile visualisation & HCI |
|
What are the Gartner BI Platforms Core Capabilities of BI&A 1.0? |
- Ad hoc query & search-based BI - Reporting, dashboards & scoreboards - OLAP - Interactive visualization - Predictive modeling & data mining |
|
What are the characteristics of Gartner Hype Cycle in BI&A 1.0? |
- Column-based DBMS - In-memory DBMS - Real-time decision - Data mining workbenches |
|
What are the characteristics of Gartner Hype Cycle in BI&A 2.0? |
- Information semantic services - Natural language question answering - Content & text analytics |
|
What are the characteristics of Gartner Hype Cycle in BI&A 3.0? |
- Mobile BI |
|
Draw a high-level framework for BI&A |
See lecture 1, slide 41. |
|
Draw a matrix of the different users of BI&A |
See lecture 1, slide 42. |
|
For which tasks are BI&A platsforms actually being used? (List the 8 most common in order) |
1. Use parameterized reports / dashboards 2. View static reporting 3. Interactively exploring and analyzing data 4. Doing simple ad hoc analysis 5. Monitoring performance via a formal scoreboard 6. Using personalized dashboards 7. Executing moderately complex to complex ad hoc analysis and discovery 8. Using predictive analysis and / or data mining models |
|
Comments about typical BI&A usage |
- Strong focus on "traditional" business analysis via reports - More users perform simple than complex analysis - Only very few users contuct predictive analysis and/or use data mining techniques |
|
What features are provided by state-of-the-art BI&A Platforms in terms of information delivery? |
-Reporting -Dashboards -Ad hoc query -Microsoft Office integration -Search-based BI -Mobile BI |
|
What features are provided by state-of-the-art BI&A Platforms in terms of analysis? |
- Online analytical processing (OLAP) - Interactive visualization - Predictive modeling and data mining |
|
What features are provided by state-of-the-art BI&A Platforms in terms of integration? |
- Scorecards - Prescriptive modeling, simulation and optimization - BI infrastructure - Metadata management - Development tools - Collaboration |
|
Describe reporting |
Provides the ability to create formatted and interactive reports, with or without parameters, with or without parameters, with highly scalable distribution and scheduling capabilities |
|
Describe dashboards |
Includes the ability to publish web-based or mobile reports with intuitive interactive displays that indicate the state of a performance metric compared with a goal or target value. Increasingly, dashboards are used to disseminate real-time data from operational applications, or in conjunction with a complex-event processing engine. |
|
Describe ad hoc query |
Enables users to ask their own questions to the data, without relying on IT to create a report. In particular, the tools must have a robust semantic layer to enable users to navigate available data sources |
|
Describe Microsoft Office integration |
Sometimes, Microsoft Office (particularly excel) acts as the reporting or analytics client. In these cases, it is vital that the tool provides integration with Microsoft Office, including supprt for document and presentation formats, formulas, data "refreshes" and pivot tables. Advanced integration includes cell locking and write-back. |
|
Describe search-based BI |
Applies a search index to structured and unstructured data sources and maps them into a classification structure of dimensions and measures that users can easily navigate and explore using a search interface |
|
Describe mobile BI |
Enables organizations to deliver analytic content to mobile devices in a publishing and/or interactive mode and takes advantage of the mobile client's location awareness |
|
Describe online analytical processing (OLAP) |
Enables users to analyze data with fast query and calculation performance, enabling a style of analysis such as "slicing and dicing". Users are able to navigate multidimensional drill paths. They also have the ability to write back values to a proprietary database for planning and "what if" modeling purposes. This capability could span a variety of data architectures (such as relational or multidimensional) and storage architectures (such as disk-based or in-memory) |
|
Describe interactive vizualization |
Gives users the ability to display numerous aspects of the data more efficiently by using interactive pictures and charts, instead of rows and columns |
|
Describe predictive modeling and data mining |
Enables organizations to classify categorical variables and to estimate continuous variables using mathematical algorithms. |
|
Describe scorecards |
These take the metrics displayed in a dashboard a step further by applying them to a strategy map that aligns key performance indicators (KPI's) with a strategic objective |
|
Describe prescriptive modeling, simulation and optimization |
Supports decision making by enabling organizations to select the correct value of a variable based on a set of constraints for deterministic processes and by modeling outcomes for stochastic processes |
|
Describe BI infrastructure |
All tools in the platform use the same security, metadata, administration, portal integration, object model and query engine and should share the same look and feel |
|
Describe meta data management |
Tools should leverage the same metadata and the tools should provide a robust way to search, capture, store, reuse and publish metadata objects, such as dimensions, hierarchies, measures, performance metrics and report layout objects |
|
Describe development tools |
The platform should provide a set of programmatic and visual tools, coupled with a software developers kit for creating analytic applications, integrating them into a business process and/or embedding them in another apllications |
|
Describe collaboration |
Enables users to share and discuss information and analytic content and/or manage hierarchies and metrics via discussion threads, chat and annotations. |
|
Name some complete BI-platforms |
IBM SAP SAS ORACLE Microsoft |
|
Name some platforms with focus on selected BI capabilities |
tableau QlikView InfoZoom Teradata |
|
Describe the gartner magic quadrant for BI&A platform vendors |
- Number of BI&A platsform vendors continuously increases - Most vendors are catagorized as "niche players" or "leaders" - There are only few "challangers" and "visionaries" |
|
Name the key take aways from lecture one |
See last slide. |
|
The first definition of data warehouse and its author |
A data warehouse is a pool of data produced to support decision making ...Dataare usually structured to be available in a form ready for analytical processing. (Turban, 2008) |
|
The second definition of data warehouse and its author. |
A data warehouse is a subject-oriented, integrated, time-variant, non-volatilecollection of data in support of management’s decision-making process. (Inmon, 1996) |
|
The third definition of data warehouse and its author. |
A copy of transaction data specifically structured to query and analysis. (Kimball, 1996) |
|
What are the characteristics of Data Warehousing? |
- Subject oriented - Integrated - Time variant (time series, chronology) - Nonvolatile (persistent) - Relational/multidimensional - Client/server - Include metadate |
|
What are the differences between OLTP and Data Warehouse regarding data content, data organization and the nature of data? (OLTP vs Data Warehouse) |
Data content: Current value vs historical data, summarized data, calculated data Data organization: Application by application vs subject areas across enterprise Nature of data: Dynamic vs Static until refreshed, based on frequency |
|
What are the differences between OLTP and Data Warehouse regarding data manipulation and usage? (OLTP vs Data Warehouse) |
Data manipulation: Updated on a field-by-field basis vs accessed & manipulated usually no direct update Usage: Highly structured, repepitive processing (Clerical user) vs Highly structured, analytical processing (Knowledge user) |
|
What are the differences between OLTP and Data Warehouse regarding response time and updates vs reports? (OLTP vs Data Warehouse) |
Response time: cricical (sub-second to several seconds) vs several seconds to minutes Updates&Reports: Real-time Updates, batch reporting vs Batch updates, real-time reporting |
|
What are the direct benefits of a data warehouse
|
- Allows end users to perform extensive analysis - Allows a consolidated view of corporate data - Better and more timely information access - Enhanced system performance - Simplification of data access |
|
What are the indirect benefits from end users using the direct benefits of Data warehouse? |
- Enhance business knowledge - Create competitive advantage - Enhance customer service and satisfaction - Facilitate decision making - Help in optimizing business processes |
|
What are the three simplified parts of data warehousing architecture? |
- The data warehouse that contains the data and associated software - Data acquisition (back-end) software that extracts data from internal (ERP-) systems and external sources, consilidates and summaraize them and loads them into the data warehouse - Client (front-end) software that allows users to access and analyze data from the warehouse |
|
What are data sources? |
Contains the data to be loaded in the Data Warehouse e.g. ERP systems, Relational databases, flat files, web services |
|
What is enterprise data warehouse (EDW)? |
A centralized repository for the entire enterprise |
|
What is a data mart? |
A departmental data warehouse that stores only relevant data. |
|
What is a dependent data mart? |
A subset that is created directly from a data warehouse. Consistent data (integrated) |
|
What is an independent data mart? |
A small data warehouse designed for a strategic business unit or a department |
|
What is the main characteristic, pros and cons of data mart centric architecture? |
Independent data marts + Easy to build organizationally + Easy to build technically - Business enterprise view unavailable - Redundant data costs - High ETL costs - High App costs - High DBA and operational costs |
|
What is the main characteristic, pros and cons of the virtual, distributed and federated architecture? |
Leave data where it lies + No need for ETL + No need for separate platform - Only viable for low volume - Meta data issues - Network bandwidth and join complexity issues - Workload typically placed on a workstations (this never works) |
|
What is the main characteristic, pros and cons of the hub-and-spoke architecture? |
Dependent data marts + Allows easier customizations of user interfaces and reports - Business enterprise view challenging - Redundant data costs - High DBA and operational costs - Data latency (most common) |
|
What is the main characteristic, pros and cons of enterprise data architecture? |
Centralized integration data with direct access + Business enterprise view + Design consistency & data quality + Data reusability - Requires corporate leadership and vision |
|
Name the first 5 factors that potentially affect the architecture selection decision |
1. Information interdependence between organizational units 2. Upper management's information needs 3. Urgency of need for a data warehouse 4. Nature of end-user tasks 5. Constraints on resources |
|
Name the last 5 factors that potentially affect the architecture selection decision |
6. Strategic view of the data warehouse prior to implementation 7. Compatibility with existing systems 8. Perceived ability of the in-house IT-staff 9. Technical issues 10. Social/political factors |
|
Describe the Kimball model |
Kimball views data warehousing as a constituency of data marts. Data marts are focused on delivering business objectives for departments in the organization. And the data warehouse is a conformed dimension of the data marts. Hence a unifies view of the enterprise can be obtain from the dimension modeling on a local department level (Turban et al., 2007) (see also lect. 2, sl. 21) |
|
Describe the Inmon model |
Inmon beliefs in creating a data warehouse on a subject-by-subject area basis. Hence the developmnt of the data warehouse can start with data from the online store. Other subject areas can be added to to the data warehouse as their needs arise. Point-of-sale (POS) data can be added later if management decides it is necessary. (Turban et al., 2007) (see also lect. 2, sl. 22) |
|
Describe the difference between data mart approach and EDW approach in terms of overall approach, complexity and development methodology. |
- Overall approach: Bottom-up vs Top-down - Complexity: High vs Low - Development methodology: iterative vs step-wise |
|
Describe the the data mart approach in terms of architecture structure |
- Data mart is subject-oriented (e.g. for single business processes) or department-oriented (e.g. only for sales) - "Build one data mart at a time" -> the DW is developedsequentially - DW = collection of data marts |
|
Describe the the EDW approach in terms of architecture structure |
- One central EDW provides theconsistent and comprehensive view ofthe enterprise - Data marts are optional supplementsfor specific departments or subjects - Data marts are based on the EDW.That means, they get their data fromthe EDW. |
|
Describe the difference between data mart approach and EDW approach in terms of scope, development time, cost, difficulty, size, freq of update and no. of users (there are more on lec.1 sl. 26) |
- Scope: one subject area vs. several subject areas - Time: months vs. years - Cost: $10,000 to $100,000+ vs $1,000,000+ - Diff.: Low to medium vs. high - Size: MB to several GB vs. GB to PB - Freq. of upd.: hourly, daily, weekly vs daily, weekly No. of users: 10s vs 100s to 1000s |
|
When modeling a data warehouse, what four perspectives is included? |
1. The principle design approach for building the data warehouse, typically one destinguishes (Kimball vs Inmon) 2. The multidimensional data model as a foundation of data warehouse design 3. Relational data models containing data, e.g. operational and master data 4. Meta data describing the structure of all data warehouse data. |
|
What is measurable business facts? |
Facts are quantifies with explanatory power for diagonosis, monitoring and coordination of a system E.g. revenue, profit costs. (not derived facts, KPIs) Facts include descriptive attribute, such as currency, unit, range of values |
|
What are dimensions in the multidimensional model? |
A business fact can be viewed and analyzed along different perspectives (e.g. time, space etc). Further hierarchical structures can also be added to dimensions (the time dimension can be structured in Year, Quarter, Month ...) |
|
What different types of facts are there? |
- Additive : additive aggregation along all dimensions possible - Semi-additive: additive aggegation only for selected dimension - Non-additive (e.g. avarage values, percentages) |
|
What are the differences between star schema and snowflake in terms of table structure? |
Both star schema and snowflake schema has one fact table and multiple dimension tables. However, snowflake model has several attribute tables. |
|
What are the differences between star schema and snowflake in terms dimension normalization, modeling effort and data compression? |
- Dimension normalization: No dimension normalization vs Dimension normalization (3NF) - Modelling effort: Low vs high - Data compression: Low vs high |
|
What are the differences between star schema and snowflake in terms query performance with text filters |
Slow vs fast. e.g. "Sum of all sales from "Stuttgart" store" |
|
Explain the three steps of the data provision process. |
ETL: - extraction (reading data from a database) - transformation (i.e. converting the extracted data from its previous form into the form in which it needs to be) - and load (putting the data into the data warehouse) |
|
Describe in what ways extraction can be differentiated. |
- Synchronous / asynchronous access - File based extraction / stream based extraction - Full extraction / delta extraction - Usage of filters / no filters - Standard extractors /custom extractors |
|
in what four activities can transformation be subdivided into? |
- Filtering (e.g. filter all deleted orders) - Harmonization (e.g. resolve master data incosistencies) - Enrichment (e.g. calculate new facts from existing ones) - Aggregation (e.g. by minimizing a dimension) |
|
Describe the load phase |
Data is updated in the final data storage (e.g. a cube) - Full load vs Delta load - Daily, Weekly, Monthly load Load process has to be customized to the chosen data model (star vs. snowflake etc.) Data quality mechanisms are often implemented (e.g. uniqueness, mandatory fields) to be triggered during the load |
|
Describe ETL automation |
- Usually not triggered manually, but run automated, trigger types are time or event trigger - Can be programmed or modeled to be automated - May be visualized to show sequence of steps - Logging and monitoring funcionality supports the Data Warehouse administrators in case of errors during ETL process - ETL automation is an important part of every BI project and can require a large portion of the project efforts |
|
What is the main difference between metadata and master data? |
Metadata = information about data Master data = non-transactional business data (Further info: slide 55 lecture 2) |
|
What types of metadata are there and what are their goals? |
Business: Explain what things mean Technical: Technical description of data assets Operational: Monitor job execution (Further info: slide 57, lec 2) |
|
Name potential benefits of metadata mgmt? |
- Build common understanding of data - Facilitate the quest for data quality - Support discovery and reuse of data - Analyse dependencies - Facilitate future changes - Monitor usage (Details slide 57 and 28 lecture 2) |
|
What four dimensions should be defferentiated when speaking about big data? |
- Volume - terabytes - Variety - structured, unstructured, text & multimedia - Velocity - streaming data - Veracity - imprecise data types (uncertainty) (the four v's) |
|
Name three reasons why the big data trend probably will continue to accelerate in the coming years |
1. IOT accelerates data generation with sensors 2. Media spectrum is broadened by image, audio and video 3. Faster computers produce more complex simulation results that need to be analyzed |
|
What are the opportunities of Big Data? |
- Simulations, sentiment analysis, network analysis |
|
What are the threats of the Big Data trend? |
- Increased dependancy on systems - Privacy, data security, and ethics |
|
What technological changes have been made to enable big data? |
Data processing: - from 32-bit to 64-bit processing - from single core sequential to multi core parallel processing Data storage: from disk storage to in-memory Data organization: - from row-based to column-based databases - from simple vectors to dictionary encoding |
|
Describe column-based databases |
Group by attributes instead of ID. Allow much faster data access for typical BI operations |
|
Explain dictionary encoding |
- In order to increase data compression and improve search performance, BI systems utilize indexing - Column-based BI systems use single-attribute vectors. - Instead of dimensions, simple dictionaries are applied. (see slide 23, lect. 3) |
|
What's the difference between traditional and big data approach? |
Traditional: Business users determined what questions to ask -> IT structured data accordingly Big data approach: IT delivers a platform to enable creative discovery -> business explores what questions could be asked (however, both overlap each other) |
|
What are the differences between traditional computing and stream computing? |
Historical fact finding vs. current fact finding Find and analyze information stored on disk vs. analyze data in motion - before it is stored Batch paradigm, pull model vs. low latency paradigm, push model Query-driven: submits queries to static data vs. data driven - bring the data to the query Query->Data->Result vs. Data->Query->Result |
|
What is the motivation for stream computing? |
- Data might be outdated before users are able to analyze it - Data rates and volumes are too great for storing and subsequent analysis |
|
What is the problem with analyzing data, the solution and the goal? |
Time required for analyzing data is very long. Solution is parallel analysis of data.
Goal: Do not a analyze a query on a single server! Instead, distribute the query across an entire network and process it in parallel! |
|
What are the challanges of parallel computing? |
- Difficult to decide how to split data and computation across network
- Risk of server failure |
|
How does Hadoop approach to solve the challanges of parallel computing? |
- Node failure: Store data redundantly on multiple nodes within the network -> if one node failes, 2 other nodes can be used instead
- Low bandwidth: Store data at nodes and remember where it is stored -> data does not need to be distributed to nodes anymore -> only the computation needs to be distributed - Development of program code for parallel processing is difficult: provide a simple map-reduce algorithm to the developer to handle work distribution and management of nodes automatically |
|
Describe how HDFS works |
Split the data into partitions (blocks), each block is being replicated three times across 3 nodes. A master node stores metadate (e.g. file names, locations,...) |
|
Describe MapReduce algorithms |
Map function: - Takes a set of key-value pairs - Creates a set of zero or more key-value pairs - Input and output pairs are usually different Reduce function: - Executed for each key - aggregates all values according to the key |
|
Describe denormalized data models and when to use it |
Store data redundantly. Also known as "embedded data models". Every entry stored in one data piece. Use: -If you have "contains"-relationship (i.e., 1:1 relationships) - If you have 1:n relationship |
|
What are the benefits and downsides of denormalized data models? |
+ Request and retrieve related data in a single database operation + Better performance for read operations + Update related data in a single write operation - Database records may grow after creation - Database record growth can impact write performance - Threat of data fragmentation |
|
Describe normalized data models and when to use it. |
Normalized data models descibe relationships using references between data pieces. Use: - If denormalized data models provide only little read performance advantages - If modeling m:n relationships - If modeling large hierarchical data sets |
|
What is SQL? |
- SQL is a standardized, powerful, high level programming language for querying databases - Almost all relational databases support SQL - Relations (=tables) are linked via references -> SQL primarily supports normalized data models! |
|
What are the advantages and downsides of SQL? |
+ Standardized and implemented by basically every relational database system + Powerful -> many operations exists + Many people (also from the business side) are able to write SQL code + High level code + ACID (atomacity, consistency, isolation, durab.) - Data structures (e.g. tables) need to be defined upfront - SQL code always needs to be translated into low level code -> reduces performance - Rather slow execution |
|
What are relational data models mainly used for? |
Transactions and analysis of transactional data |
|
SQL vs NoSQL |
SQL: - The notion of SQL is typically used to refer to relational data models and/or normalized data models NoSQL: - The notion of NoSQL is typically used to refer to non-relational data models and/or denormalized data models |
|
Give examples of non-relational data model approaches in NoSQL |
Key-Value stores: + Very simple to program and implement + Can be easily distributed across multiple machines Document-oriented store: + Still simple to program and implement + Can be easily distributed across multiple machines + More structure than key-value stores Graph-oriented store: + Flexible extension of data model |
|
T/F: Big data platforms are to process unstructured data |
F: Structured data is part i almost all engagements |
|
T/F: Big data technology requires huge amounts of data |
F: It is more about flexibility than pure volume |
|
T/F: Big data = Apache Hadoop |
F: Hadoop is a well known platform but dependent on the use cases other platforms are suited better |
|
T/F: Big Data makes traditional BI / DWH platforms obsolete |
F: Databases and BI will co-exist with Big Data technologies. |
|
T/F: Big Data requires new skills |
T: New job roles are needed, for instance, data scientists. |
|
T/F: Big Data changes IT architecture |
T: Sandboxes, deep data zones, queryable archives... |
|
T/F: Big Data requires focus on data security and privacy |
T |
|
T/F: Big Data fits best with agile methods |
T |
|
T/F: Big Data affects only some industries |
F: There is probably no industry without a use case for Big Data |
|
T/F: Big data is a hype |
T & F |
|
What three target groups are there for BI consumption and what reports do they demand? |
- Executive management - Performance Management, Dashboards, Scoreboards, KPIs - Business analysts: ad-hoc queries, On-Line Analytical Processing (OLAP) - Front Line employees - Operational or standard reports |
|
Describe the dissemination of BI consumption |
94%: Reporting => information delivery 5%: Data analysis: Data analysts & power users (controlling, purchasing, etc.) 1%: Data mining: small amount of expert users in dedicated analysis departments |
|
Describe the BI consumption taxonomy |
Visualization: Table vs Graph Singularity: Single report vs Multiple reports Interaction: no vs. low vs. high |
|
Describe the taxonomy of business reporting |
Table report: Table, Single, No/Low interaction Graph report: Graph, Single, No/Low interaction Dashboard: Graph, Multiple, No/Low interaction |
|
Describe the taxonomy of Analytical reporting |
OLAP report: Table vis., Single report, High inter. |
|
What is the definition of Business Reporting? |
All types of BI consumption covering: - Efficient (visual) communication of data - with limited interactivity - and limited analytic capabilities |
|
What kind of table reports exists regarding creation, execution and delivery? |
Creation: - Standard reports (created by developer, consumed by user) - Ad-Hoc Reports (created and consumed by user) Execution and delivery: - Manual execution - Automatic execution (and delivery) |
|
What characteristics does table reports have? |
- Simplest form of BI-reports - Two basic forms: one-dimensional (list) and two dimensional (matrix) - Often a small data scope (e.g. department) - Used by all kind of users - Reports are usually parameterized |
|
What report creation trigger types are there? |
- Periodic report: created according to a certain schedule (montly analysis of sales by customer) - Special/exceptional report: created when something out of the ordinary happens (defined threshold has been exceeded) |
|
In what four basic ways can exceptions be incorporated into (trigger) reports |
- Prepare the report only when exception occur - Highlight the exceptions - Group the exceptions together (e.g. specific "deviation" column) - Comparison of actual and planned figures. Basic statistical formulas are used (variance, median) |
|
What are the three steps to visualize data? |
1. Choose visual representation 2. Arrangement of visual elements 3. Selection and (de-)emphasis of interesting data |
|
Describe what visual representation means |
- Representation: mapping of available information to a visual forman - Data objects, their attributes, and the relationships among data objects are translated into graphical elements such as points, lines, shapes and colors |
|
What three dimensions can visual representation be organized in? Give examples |
Data type (one-/two-/multi-dimensional, text) Graph type (Bar, Line, Pie, Radar) Interaction & distortion technique (Standard, Projection, filtering, zoom) |
|
Explain what different types of data there are |
- One-dimensional data (time varient data, e.g. stock prices) - Two/three-dimensional data (e.g. geo-graphical data) - Multidimensional (typical for datamining tasks, no "obvious" mapping of multiple dim.) - Special data types (textual, graph/network data) |
|
Explain what different types of graph there are |
- Bar graph (and variations) - Line graph (and variations) - Area graph - Pie/ring graph - Tree map - Radar graph - Gauges and meters - Bullet graph - Box/scatter plots |
|
Describe the motivation for interaction techniques and important concepts |
- Interactions allows for more dynamic analysis of data - Helps to encourage exploration Important concepts: Direct manipulation strategies Rapid, incremental and reversible actions Selection by pointing (not typing) Immediate and continuous feedback |
|
Describe what interaction techniques there are |
- Dynamic projection (dynamically change the visualization to explore multidimensional data sets) - Interactive filtering --browsing, can be difficult for big data sets. --querying, need to specify a subset - Zooming - Distortion (some part of data in high detail) - Brushing and linking (selection from one visualization is fed into another, selected instances highlighted in some way) |
|
When is a bar graph good to use? |
- For displaying fact(s) associated with nominal (e.g. region) or ordinal attributes (e.g. size) - For comparing values with each other |
|
What bar graph variations are there? |
- Stacked bar graph good to display multiple instance of a whole and its parts - focus on whole - Grouped bar graph good to display multiple instance of a whole and its parts - focus on parts |
|
When is a line graph appropriate to use? |
- Good to display fact(s) associated with interval attributes (e.g. time) - Good to reveal shape of data (e.g. movements up and down), especially changes over time |
|
What is spark lines? |
A variation of line graph: A very space-efficient representation which is often used to display changes of multiple data sets in a dashboard (invented by Edward Tufte) |
|
When is a bar/line graph combination appropriate? |
Good if you want to combine visualiations of data changes and data comparison in one graph e.g. expences (bar chart) and profits (line chart) |
|
What are the problems with area graphs? |
- Occlusion (some of the values are hidden by an overlaying area) - Inaccurate interpretation |
|
When is a pie graph good and what are the disadvantages? |
+ Good to display a whole and its parts, but often this can be done better with a bar graph - Difficult to assess size / value of parts - Difficult to differentiate colors (if there are many) - Constant eye movements between graph and legend |
|
What are some characteristics of tree maps? |
- The Treemap is a space-constrained visualization ofhierarchical structures. - Can show attributes of leafnodes using size and color(brightness and hue). - Easy to navigate into sub-trees(cf. interaction techniques) |
|
What is a radar graph and when is it appropriate to use? |
A radar graph is a circular graph that encode values on seperate axes that radiated from center. Usually a bar graph is better to use due to its better readability. (exception: HR-competences and when the axes can naturally be arranged as a circle, e.g. the hours of a day) |
|
What are gauges and meters and what are their issues? |
- They display the value of a single fact, sometimes compared to related values (e.g. targets) or color-encoded ranges) - Common in dashboards: take up a lot of space - Color coding (red vs green) cannot be perceived by all people (better to use bullet graph) |
|
What are bullet graphs |
- A variation of a bar graph developed by Stephen Few - Can serve as a replacement for dashboard gauges and meters |
|
What are box plots? |
- A way of displaying the distribution of data - Condensed visualization of core statistical parameters |
|
Name some characteristics of scatter plots |
- Additional attributes can be displayed using size, shape and color of the markers - Correlation structures can be recognized easily |
|
Comments about visualizations |
Helpful to: - explore data - confirm hypothoses - communicate findings (less is more) |
|
What is the definition of dashboard |
A dashboard is a visual display of the most important information needed to achieve one or more objectives, consolidated and arranged on a single screen so the information can be monitored at a glance |
|
What does the business need from a dashboard |
- High impact visualization of key metrics - Easy to use and find information - Intuitive user interface and navigation - Ability to manage and monitor metrics effortlessly - Actionable analyses - Drillable metrics |
|
Give some examples of good and bad dashboard qualities |
+ Important values are color-coded + Very cautious use of color + Compact visualization of extensve amounts of information + Homogeneouse, consistent usage of graph types - Too many colors - Bad scaling (hard to compare) - Bad |
|
What are the six gestalt principles |
- Proximity - Similarity - Closure - Enclosure - Connection - Continuity |
|
Name some common mistakes in dashboard design |
1. Exceeding the boundaries of a single screen 2. Supplying inadequate context for the data 3. Displaying excessive detail or precision 4. Choosing a deficient measure 5. Choosing inappropriate display media 6. Introducing meaningless variety 7. Using poorly designed display media 8. Encoding quantitative data inaccurately 9. Arranging the data poorly 10. Highlighting important data ineffectively or not at all 11. Chattering the display with useless decoration 12. Misusing or overusing color 13. Designing an unattractive visual display |
|
What is OLAP? |
Technologies and tools that support (ad-hoc) analysis for multi-dimensionally aggregated data (Table, Single report, High Interaction) |
|
What different types of OLAP are there? |
MOLAP: (a lot of computing power, copy data) - data resides in a multidimensional DBMS - multidimensional engine (OLAP server) provides access ROLAP: - data resides in a relational DBMS - OLAP server provides SQL queries HOLAP: - detailed data redsides in a relational DBMS - aggregated data resides in a multidimensional DBMS |
|
What are some typical OLAP operations? |
Roll up (drill-up): summarize data - by climbing up hierarchy or by dimension reduction Drill down (roll up): reverse of roll up - from higher level summary to lower level summary or detailed data Slice and dice: - filter using one or more dimension Pivot (rotate): - reorient the cube, visualization, 3D to series of 2D planes |
|
Describe roll-up and drill-down |
Drill-down (roll-down)): - From higher level summary to lower level summary or detailed data, or introducing new dim. Roll-up (drill-down): - reversed - summarize data by climbing up hierearchy or by dimension reducing |
|
Describe how slice and dice works. |
The slice-operation represents "cutting out" one slice of an n dimensional cube by using a filter on one dimension (results in an (n-1) dimensional cube) The dice operation represents "cutting out" a small cube from a big one by performing a filter. (results in a smaller cube, a dice) |
|
Describe pivot/rotate operations |
Swapping the rows and columns, or moving one of the row dimensions into the column dimension |
|
What is the motivation of big data analytics? |
The amount of data to be analyzed is constantly growing. A sole concentration on manual / interactive analysis methods like table or OLAP is not sufficient anymore Methods and tools that semi-automatically generate knowledge from large data sets and documents are needed |
|
What 5 challenges are there of advanced analytics? |
- Forecasting (how do historical sales translate..) - Key influencers (what is the main influencers for success/failure) - Trends (what are the trends: historical/emerging) - Relationships (what are the correlations indata) - Anomalies (what anomalies..) |
|
What is the definition of Knowledge Discovery (in Databases)? |
Knowledge Discovery in Databases (KDD) is a non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data. (Fayyad et al. 1996) |
|
Definition: Hypothesis vs discovery |
-Hypothesis-driven approach Begins with a proposition by the user, who then seeks to validate the truthfulness of the proposition -Discovery-driven approach Finds patterns, association, and relationships among the data in order to uncover facts that were previously unknown or not even contemplated by the organization |
|
Definition: supervised learning vs unsupervised learning |
Supervised: Goal: Predict data with unknown target attribute value with minimal error. - Search for dependancies of a target attribute on the input data Unsupervised: Goal: create a pattern of a more compact description of the data. - No reference to target attribute, error not measurable |
|
Knowledge Discovery vs. Statistical approaches |
- It was expected that knowledge discovery would substitute classical statistical approaches - There was the hope that knowledge discovery can be successfully applied without experience and knowledge about the methods - Fact is: they complement each others and software tools have merged together |
|
What is the definition of data mining? |
"Data mining is a process that uses statistical, mathematical,artificial intelligence, and machine learning techniques to extractand identify useful information and subsequent knowledge fromdatabases. Data Mining is used for finding mathematical patterns from usuallylarge sets of data. These patterns can be rules, affinities,correlations, trends, or prediction models." (Nemati and Barko, 2001) |
|
Give some examples of supervised and unsupervised data mining techniques |
Unsupervised: Association rules, K-means clustering, Hierarchical clustering (clustering and association rules) Supervised: SVM, naive bayes, decision tree, neural networks (classification) |
|
What are association rules and what two parameters are relevant? |
- Association rules describe correlations between attributes appearingtogether in transactions. - Confidence, strength of correlation - Support, frequency of appearance |
|
Describe decision tree shortly |
- A decision tree is a set of logical rules. - Decision tree is an intensional description of a given set of classes - Important: decision trees are easier to read and understand that logical rules |
|
What is the goal of predictive analytics? |
Unlock data to move from decision making from sense & respond to predict and act |
|
Describe what types of analytics there are, their goals and an example. |
Descriptive: summarize what happened - Vast majority of analytics is descriptive (OLAP) Predictive: make predictions about the future - Utilize a variety of statistical, modeling, data mining, and machine learning techniques to study recent and historical data (sent. analys.) Prescriptive: Recommend one or more causes of action and show most-likely outcomes of the action |
|
What is predictive analytics, two definitions? |
"... the exploration and analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns and rules" "... the process of discovering meaningful new correlations, patterns and trends by sifting through large amounts of data stored in repositories, using pattern recognition techniques as well as statistical and mathematical techniques." |
|
What is the definition of text mining? |
"Applications of data mining to non-structured or less structured text files. It entails the generations of meaningful numerical indices from the unstructured text and the processing these indices using various data mining algorithms large databases" (Turban et al., 2007) |
|
Why is textual data very different from database-like data? |
- is unstructured (text) or semi-structured (e.g. markup with HTML, XML) - may be interlinked (e.g. hyperlinks) - is very heterogenous (different languages, spelling errors) |
|
Shortly: text classification and text clustering |
- "Text categorization is the activity of labeling natural language texts with thematiccategories from a predefined set." (vector of text document with standard classifications algorithms) - "...the partitioning of texts into previously unseencategories.." (vector -> standard clustering algorithms) |
|
What is the definition of web mining? |
"The discovery and analysis of interesting and useful information fromthe Web, about the Web, and usually through Web-based tools." |
|
What is the definition of business performance management? |
“Business Performance Management enables an organization toeffectively monitor, control and manage the implementation ofstrategic initiatives” |
|
What are the pros and cons of spreadsheets regarding business planning? (64% of planning tools used) |
+ Small businesses + Extremely individual requirements + Short-term need - Process control - Access protection - Performance - Complexity - Errors - Consolidation - Seasonal trend model - Organizational changes - Growing company |
|
Define a balanced scoreboard |
“A balanced scorecard is a comprehensive set ofperformance measures defined from four differentmeasurement perspectives (financial, customer,internal, and learning and growth) that provides aframework for translating the business strategy intooperational terms” (Kaplan and Norton, 1996) |
|
What four dimensions are there of a balanced scoreboard? |
- Financial (should serve as a focal point for all objectives and measures in all other measures) - Customer (enable companies to align their customer outcome measures - satisfact....) - Internal (will focus their metrics on processes that will deliver the objectives for cust/shareh) - Learning&Growth (develops objectives and measures to drive learning for other 3 perspect) |
|
What is a strategy? Draw a strategy map. |
"A strategy is a set of hypothesis about cause and effect" map on slide 81 , lecture 5-6 |
|
What five steps does the integrated, closed-loop strategy to execution cycle include? |
1. Develop, formulate and syndicate strategy 2. Translate and cascade strategy 3. Operationalize strategy 4. Monitor and optimize strategy execution 5. Validate and adapt strategy |
|
Draw the "Big Picture" |
Slide 87, lecture 5-6 |
|
What is the motivation for process intelligence? |
- BI in its classical form rather looks at outcome-oriented, high-level KPI's, decoupled from the actual business processes. - Process intelligence looks at the process level and focuses on operational performance to be transparent at all times |
|
What is the definition of (Business) Process Intelligence? |
"(Business) Process Intelligence (BPI) refers to the application of business intelligence techniques to business processes" (Grigori et al. 2004). Extension to Grigori et. al’s (2004): “BPI comprises a large range of application areas spanning fromprocess monitoring and analysis to process discovery, conformancechecking, prediction and optimization.” |
|
Draw the Soh and Markus (1995) model |
Slide 5 and 7, lecture 7 |
|
What is the definition of organizational adoption process? |
Organizational Adoption (process) involves all actions of individualsin an organization that deal with creating awareness, selecting,evaluating, initiating and deciding for the implementation of newES technology. |
|
What is the definition of the conversion process? |
Conversion (process) involves all actions of individuals (in anorganization or across organizations) that deal with developing andimplementing a new ES technology. |
|
What is the definition of the use process? |
Use (process) involves all actions of individuals in an organizationthat deal with using and changing ES technology or the respectivework system to realize intended business value. |
|
Draw the key performance indicators pyramid |
Slide 11, lecture 7 |
|
What are some pre-adaption / organisational adoption activities? |
- KPI alignment
- BI strategic alignment - Governence - BI vendor selection - Organizational structure - Controlling |
|
Describe the BI project liftecycle |
Justification Planning Business analysis Design Construction Deployment |
|
How does a BI strategy benefit IT? |
- Help align with business partners, formalize business needs - Create prioritized roadmap for the enterprise of short, medium and long term projects aligned with strategic business goals delivering measurable results - Creating business justifications for an enterprise scope and end-to-end BI including data management |
|
How does a BI strategy benefit a LOB? |
- Have departmental spend go further and contribute to enterprise investments required. - A departmental BI need often involves needing data from other groups. SOlve the departmental pain points by removing limits of a departmental focus through an enterprise-wide strategy - An enterprise BI approach provides a unified approach by all departm. => "speak same langu" |
|
What can BI LOB-organization consist of? |
- Central BI competence centers - Decentralize BI groups / departments - BI governance committees |
|
Give some examples of why BI initiatives are complex endeavors |
- Disparate business data must be integrated and integration goes beyond simply bridging systems - It's about information consolidation and integrity as well as establishing an end-to-end view - Alignment across organizations regarding master data and KPIs needs to happen - New technology is introduced |
|
What three factors can implementation be categorized into? |
1. Organizational issues 2. Project issues 3. Technical issues |
|
What issues/common beliefs.. are to be considered when building a BI system? |
- ..data warehousing database design is the same as transactional database design - Delivering data with overlapping and confusing definition - ..promises of performance, capacity and scalability (triangle of the three) - .. that your problems are over when DWH is up and running - .. that Big Data makes DWH obselete |
|
What are the most common failure factors in BI projects? |
- Unclear business / information objectives - Low levels of data summarizations : getting lost in detail - Lack of (Top) Management Support - Lack of clear BI Strategy - Cultural issues being ignored - Inappropriate architecture |
|
Name some best practices for implementations |
- Project must fit with corperate strategy and business objectives - There must be a complete buy-in to the project by executives, managers and users - It is important to manage user expectations about the completed project - The DWH should be built incrementally - Built in adaptability |
|
How are the best practices implemented? |
- The project must be managed by both IT and business professionals - Do not overlook training requirements - Be politically aware - Only load data that has been cleansed and is of a quality understood by the organization - Do not stop with technical system, but set up organizational support |
|
What is the main issues pertaining to scalability and what does good scalability mean? |
The main issues: - The amount of data in the system - How quickly the system is expected to grow - The number of concurrent users - The complexity of user queries Good scalability means that queries and other data-access functions will grown linearly with the size of the system |
|
What four main areas should effective security in a BI system focus on? |
- Establishing effective corporate and security policies and procedures - Implementing logical security procedures and techniques to restrict access - Limiting physical access to the data center environment - Establishing an effective internal control review process with an emphasis on security and privacy |
|
Describe the six steps of engineering projects. |
1. Justifications (asses the business need) 2. Planning (develop strategic and tactical plans) 3. Business analysis (perform detail analysis) 4. Design (conceive a product that solves the business problem or enables the busi opport.) 5. Construction (build the product) 6. Deployment (implement/sell the finished pr. and measure its effectiveness) => 1. |
|
How should the development steps of a BI project look like |
See slide 34 and 37 of lecture 7 |
|
What are the characteristics of routinization? |
- Repetitous work - Perceived as a normal part of employees work activities - Standardized work - Incorporated into employees work processes - Employees develop familarity with the implemented IS |
|
What are the characteristics of infusion? |
- Realization of hidden value of an IS - Extension of the IS (e.g., developing additional features) - Infusion and routinization do not necissarily occur in sequence but rather occur in parallel. |
|
What are the challanges for organizations regarding BI system post-adoption? |
Having the coexistence of routine (standard reports and so forth) and innovative (further and new insights) use of a BI system. |