1、Wenfei FanShenzhen Institute of Computing SciencesUniversity of EdinburghBeihang UniversityBig Data:From Theory to Systems1The 5 Vs of Big DataThe study has raised as many questions as it has answered2 Volume:The size of data grows rapidly and continuouslyChina generated 23.9 ZB business data in 202
2、2.It is expected to reach 76.6 ZB in 2027 Velocity:“You cannot afford to make decisions based on yesterdays data”Healthcare,retail,financial services,cyber security,Variety:Relational database D,transaction graph GCan we write a query across D and G in SQL?Veracity:The most challenging issue among t
3、he 5VsReal-life data is dirty:semantic inconsistencies,duplicates,stale data,missing linksValue:Killer APPs?What practical value can we get out of big data?Big Data:Volume,Variety,Velocity,Veracity,ValueThe challenges introduced by digital economyDigital Currency Heterogeneous queries on big data ac
4、ross different models Real-time transaction processing with consistency and reliability requirements Data-driven fraud detection and intelligent analysisChallenges:How to query big data with limited resources?Volume How to answer queries across heterogeneous data models?Variety How to query dynamic
5、data in response to updates?Velocity How to clean dirty data?Veracity What is benefit of big data analytics?Value Smart City Fusion of data from various models(historical BIM/CIM;and newly collected data)Massive data from unreliable data sources Real-time analysis in response to updatesThe need for
6、both theory and systems for big data analytics 3The challenges introduced by AIGC ChatGPT has led to a large number of AIGC startups 73%startups in China focus on application domains,and 14%on LLMs.Most LLMs are developed via fine-tuning of open-source pre-trained models.To make practical use of AIG