1、2025 Databricks Inc.All rights reserved1 1Whats New in Apache Spark4.0?Daniel TenedoriodtenedorWenchen Fancloud-fanData+AI Summit 20252025 Databricks Inc.All rights reservedAnnouncing Apache Spark4.0Enhanced SQL&QueryingSpark Connect EnhancementsProductivity&StandardsSQL UDFsPIPE SyntaxSQL Scripting
2、Dynamic Session Variablesspark.api.mode=connectSwift,Rust,Go ClientsML on Spark ConnectLightweight Python ClientVARIANT Data TypeStructured LoggingLanguage-Aware CollationsANSI SQL Mode by DefaultPython API Power-UpsStreaming&Stateful InnovationEcosystem&Developer ReachNative Plotting with PlotlyPyt
3、hon DataSource APIPolymorphic Python UDTFsPySpark UDF Unified ProfilingtransformWithState APIState Data SourceState Store ImprovementsImprovedCheckpointingapplyInArrowDF.toArrowPandas 2.x SupportJDK 21 SupportSpark K8s OperatorAgendaSpark Connect and InfrastructureSpark Connect,build,deployment,stru
4、ctured logging,REST API,ETL FeaturesVARIANT type,XML data source,Python data source,data source v2 improvements,new streaming APIs,Data Science FeaturesNative plotting,polymorphic Python UDTF,Python data sources,UDF profiling,improved error context,Pandas 2 upgrade and new DataFrame featuresData War
5、ehouse FeaturesSQL scripting,SQL UDFs,SQL pipe syntax,session variables,ANSI ON by default,string collations,EXECUTE IMMEDIATE2025 Databricks Inc.All rights reservedSpark ConnectConnect to Spark from Any App Thin client,with full power of Apache SparkSpark Connect Client APISparks DriverApplication
6、GatewayAnalyzerOptimizerSchedulerDistributed Execution EngineApplicationsIDEs/NotebooksProgramming Languages/SDKsModern data application100%Scala/Java API ParityShared interfaces between Classic and Connect APIs99%Feature ParityMost features work in both Classic and ConnectSPARK-41279 Feature parity