1、PAI&Dataworks 大数据AI一体化的解读AI “iPhone”统一管理资源算力运营生成式AI智能计算科学计算AI资源效率:分布式AI怎么跑AI人力效率:AI怎么写智算平台核心目标 提升2个效率+80%=20%ModelDataModel-centric AIModelDataData-centric AISource:A Chat with Andrew on MLOps:From Model-centric to Data-centric AlData-centric MLModel-centric MLWorking on code is the central objectiv
2、eWorking on data is the central objectiveOptimizing the model so it can deal with the noise in the dataRather than gathering more data,more investment is being made in data quality tools to work on noisy datainconsistent data labelsData consistency is keyData is fixed after standard preprocessingCod
3、e/algorithms are fixedModel is improved iterativelyIterated the data qualityAIODPS 大数据计算集群PAI-灵骏智能计算集群云原生通用计算集群-DataWorksPAI-AI智能推荐PAI-Rec开放搜索 OpenSearchPAI-APIPAI-PAI-灵积PAI-智码实验室-/-LlamaAI for ScienceODPSPAI-AIAIPaaS海量数据加工(SOL、Python)ODPS-MaxCompute交互式数据分析ODPS-HologresFlink结构化数据集成Hadoop数据迁移CPUGPURD
4、MACPFSOSSNAS EMR(Spark、StarRocks)DLFDLFElasticsearch生态模型服务PAI-EAS智算服务PAI-灵骏AI加速引擎PAI-ACC分布式训练PAI-DLCDW数据建模DW数据开发DW数据治理DW数据质量DW数据安全交互式建模PAI-DSW可视化建模PAI-DesignerMLOpsPAI-OuickStart数据标注PAI-Itag特征工程PAI-FeatureStore1/MC/DW/PAI/Flink/Holo TextCHUNKQ&ACHUNKQ&AembeddingsEmbedding Model BEG/SGPT/text2vec PDF
5、TXT QA/PAIidcontentembeddingdoc_ididPAI0.1,-0.1,0.1PAIidmapjoin0.5,0.2,0.9MCidPAI0.8,-0.1,0.7PAIidHolo0.6,0.9-1.1HoloHolo/Elasticsearch/AnalyticDB/FAISS PAI+query LLM+SFT ChatGPT/Qwen/embeddingsEmbedding Model BEG/SGPT/text2vecPrompt Engineering/BladeLLMLLM+SFT ChatGPT/Qwen/+1,2,3PAILLM1.PAI 2.PAI 3
6、.PAICPU/GPU2用户行为日志实时计算 Flink特征库批流统一样本生成(Flink)样本库批流统一样模型训练(PAI-TF)数据分析模型中心1n在线预测模型部署模型验证离线计算 MaxCompute多种资源支持,AI+大数据PAI-Designer有效连接大数据+AI各种产品形态和资源ECS/EGSPAIMaxComputeFlink多种构建Flow的方式Pipeline SpecPaiFlow SDKGUIPai-DesignerEAS ServiceUserPipeline SpecHTTP APIHTTP APIPAIFlow ServicePipeline Run DAGJob