1、Deconstructing Stream StorageFlavio Junqueira Senior Director,Senior Distinguished Engineer滕昱Director,Software Engineering Dell TechnologiesStorage Abstractions Block abstraction,first disk storage units in the 1950s(650 RAMAC)DECs VAXcluster,pool of block-level storage in 1983 Storage Area Networks
2、(SANs)are a more recent development(late 1990s)BlockFileLate 1990sA Cost-Effective High-Bandwidth Storage Architecture,Gibson et al.,ASPLOS 1998ObjectSource:CTSS Programers GuideIn the 1960sStream as a storage abstraction“How hard is it to append to a file?”A skeptical colleagueAppending to a file I
3、O write size matters for throughput Batching consequently is desirable,but needs to be balanced with latency Preallocation is desirable to avoid a perfomance penalty due to block allocation Durability is criticalReadAppendReplicating for dependabilityAppendRead Starts getting into distributed system
4、s,replication problems Lack of a correct protocol can lead to unsafe systemsShared-nothing is limitingAppendReadServer 3Server 2Server 1 Storage capacity of a single stream is limited by the capacity of individual servers Scale-out storageScale-out Storage(file or object)Removes an obstacle to unbou
5、nded streamsAppendReadPlurality of data setsScale-out Storage(file or object)Co-exist with other non-stream data setsE.g.,Apache Parquet files,Apache Iceberg tables Enable use cases that join streaming data and historical/static dataQuery enginesStructured and unstructureddata at restStreamPrimary s
6、torage for all dataStreaming data lands in the data lake without data movementScale-out storage Small writes per stream leads to poor performanceScale-out Storage(file or object)AppendReadScale-out storage+LogAppendRead Durability while guaranteeing high throughput and low latency Scale-out Storage(