Big data analytics is often prohibitively costly. It is typically conducted by parallel processing with a cluster of machines, and is considered a privilege of big companies that can afford the resources. This position paper argues that big data analytics is accessible to small companies with constrained resources. As an evidence, we present BEAS, a framework for querying big relations with constrained resources, based on bounded evaluation and data-driven approximation.
. [J]. 数据分析与知识发现, 2017, 1(9): 1-7.
Yang Cao,Wenfei Fan,Tengfei Yuan. Is Big Data Analytics Beyond the Reach of Small Companies?. Data Analysis and Knowledge Discovery, 2017, 1(9): 1-7.
ITPro Newsletter: Samsung reveals’ largest ever’ SSD at 15TB (2016).
[3]
Stack Overflow: SQL: Inner joining two massive tables.
[4]
Cao, Y., Fan, W.: An effective syntax for bounded relational queries. In: SIGMOD (2016). .
[5]
Cao Y., Fan W., Geerts F., Lu P.: Bounded query rewriting using views. In: PODS(2016). DOI: 10.1145/2902251.2902294.
doi: 10.1145/2902251.2902294
[6]
Cao Y., Fan W., Wo T., Yu, W.: Bounded conjunctive queries. PVLDB (.
[7]
Fan W., Geerts F., Cao Y., Deng T.: Querying big data by accessing small data. In: PODS (2015). DOI:10.1145/2745754.2745771.
doi: 10.1145/2745754.2745771
[8]
Fan W., Geerts F., Libkin L.: On scale independence for querying big data. In: PODS(2014). DOI:10.1145/2594538.2594551.
doi: 10.1145/2594538.2594551
[9]
Cao, Y., Fan, W.: Data driven approximation with bounded resources. PVLDB 10(9), 973-984 (.
[10]
Cao Y., Fan W., Wang Y., Yuan T., Li Y., Chen L.Y.: BEAS: Bounded evaluation of SQL queries. In: SIGMOD, pp. 1667-1670 (2017). .
[11]
University of Edinburgh: Huawei deal to advance expertise in data science (2017).
[12]
Grujic I.,Bogdanovic-Dinic, S., Stoimenov, L.: Collecting and analyzing data from e-government Facebook pages. In: ICT Innovations (2014).
[13]
Facebook: Introducing Graph Search. (2013).
[14]
Abiteboul S., Hull R., Vianu, V.: Foundations of Databases. Addison-Wesley ().
[15]
Agarwal S., Mozafari B., Panda A., Milner H., Madden S., Stoica I.: BlinkDB: Queries with bounded errors and bounded response times on very large data. In: EuroSys. 2013. .
Fan W., Wang X., Wu, Y.: Distributed graph simulation: Impossibility and possibility. PVLDB 7(12) (2014). DOI: 10.14778/2732977.2732983.
doi: 10.14778/2732977.2732983
[18]
Xie C., Chen R., Guan H., Zang B., Chen H.: SYNC or ASYNC: Time to fuse for distributed graph-parallel computation. In: PPoPP, pp. 194-204 (2015).
[19]
Fan W., Wang X., Wu Y., Xu, J.: Association rules with graph patterns. PVLDB 8(12),1502-1513 (2015). DOI:10.14778/2824032.2824048.
doi: 10.14778/2824032.2824048
[20]
Fan W., Wu Y., Xu J.: Adding counting quantifiers to graph patterns. In: SIGMOD, pp. 1215-1230 (2016). DOI:10.1145/2882903.2882937.
doi: 10.1145/2882903.2882937
[21]
Fan W., Wu Y., Xu J.: Functional dependencies for graphs. In: SIGMOD (2016). DOI: 10.1145/2882903.2915232.
[22]
Cao Y., Fan W., Huang R.: Making pattern queries bounded in big graphs. In: ICDE(2015). DOI: 10.1109/ICDE.2015.7113281.
doi: 10.1109/ICDE.2015.7113281
[23]
Fan W., Wang X., Wu Y.: Querying big graphs within bounded resources. In: SIGMOD(2014). DOI: 10.1145/2588555.2610513.
doi: 10.1145/2588555.2610513