法律声明
严禁将本文用于任何形式的学术不端行为,包括但不限于考试作弊、作业代写、论文抄袭等。读者应遵守学术诚信政策及适用法律法规。
What is Data
- Definition: Individual units of information.
- Representation: Usually by variables.
- Source:
- Scientific Research
- Business Management
- Government
What is Data Analytics
- Definition: The process of inspecting, cleaning, transforming, and modelling data.
- Goal: To discover useful information, inform conclusions, and support decision-making.
- Characteristics:
- Input: Data-driven; the more data, the better—improves reliability and credibility.
- Methods: Interdisciplinary approaches combining mathematics and computer science.
- Output: Discovery of knowledge or actionable information from data.
The History of Data Analytics
- Data analytics saves people’s lives:
Example: Cholera outbreak investigation. - Big data challenge:
U.S. Census → Deployment of Hollerith Tabulating Machine. - Jump leap:
- Manhattan Project: Catalyst for “big science”.
- Space Program.
- The Era of Big Data:
- Better models:
Rule-based → Statistical → Deep Learning
(Increasing number of variables to fit data more accurately) - Better computing resources:
More powerful RAM, CPU, GPU, etc. - More data.
- Better models:
Characteristics of Big Data: 4V
- Volume(体量): Enormous amounts of data to process.
- Velocity(速度):
Batch data, real-time data, streaming data—response times from milliseconds to seconds. - Variety(多样性):
Structured, semi-structured, unstructured (e.g., text, images, multimedia). - Veracity(真实性):
Data consistency, completeness, and cleanliness.
Applications of Data Analytics
- Product Recommendation:
Recommend items to customer X based on items previously rated highly by X. - Ranking Pages:
Compute the importance of web pages using link graph analysis (e.g., PageRank). - Artificial Intelligence:
(Application area leveraging data analytics for intelligent systems.)