Big Data Architecture Course Review Notes
Introduction The requirements of big data systems include data requirements, functional requirements, performance requirements (high performance, high availability, high scalability, high fault tolerance, security, etc.), and computational scenario requirements.
The goal requirements of distributed systems/clusters or big data processing: high performance, high availability, fault tolerance, scalability, where high performance includes three metrics: response time (latency), throughput, resource utilization; high availability metrics: MTTF, MTTR, availability=MTTF/(MTTF+MTTR)
The relationship between big data and cloud computing:
Cloud computing can provide abundant computing resources for big data processing. Big data is a typical application of cloud computing services. Big data can be processed without using cloud computing. Typical scenarios of big data computation are Batch processing Stream computing Interactive querying Static data is bounded, persistently stored, with large capacity, suitable for batch processing. Stream data is unbounded, continuously generated, requires timely processing with data windows, and has no end in sight.
Overview of Cloud Computing Definition of Cloud Computing Cloud computing is a business computing model. It distributes computing tasks across a resource pool composed of a large number of computers, allowing various application systems to obtain computing power, storage space, and information services as needed. It provides dynamically scalable, inexpensive computing services on demand through the network, and represents a universally applicable resource management mindset and model. Cloud computing compares computing resources to omnipresent clouds and is the result of the development and evolution of technologies such as virtualization, distributed computing, utility computing, load balancing, parallel computing, network storage, hot backup redundancy, etc. Characteristics of Cloud Computing Unified management of resource virtualization and pooling Massive scale, high availability, high scalability Elasticity, on-demand, self-service provision Ubiquitous access, accurate billing, low cost Three Service Models Infrastructure as a Service (IaaS) Provides computing resources services such as servers, storage, and networking.