基于语义嵌入模型与交易信息的智能合约自动分类系统

黄步添; 刘琦; 何钦铭; 刘振广; 陈建海

doi:10.16383/j.aas.2017.c160655

基于语义嵌入模型与交易信息的智能合约自动分类系统

doi: 10.16383/j.aas.2017.c160655

黄步添^1,2, ,,
刘琦^3,,
何钦铭^1,,
刘振广^3,,
陈建海^1,

1.
浙江大学计算机科学与技术学院杭州 310007 中国
2.
杭州云象网络技术有限公司杭州 310012 中国
3.
新加坡国立大学计算机学院新加坡 119613 新加坡

详细信息

作者简介:
刘琦    新加坡国立大学计算机学院硕士研究生.主要研究方向为数据挖掘, 区块链.E-mail: leuchine@gmail.com

何钦铭    浙江大学计算机科学与技术学院教授.主要研究方向为数据挖掘, 虚拟化, 区块链.E-mail: hqm@zju.edu.cn

刘振广    新加坡国立大学计算机学院博士后.主要研究方向为数据挖掘, 区块链.E-mail: zhenguangliu@zju.edu.cn

陈建海    浙江大学计算机科学与技术学院讲师.主要研究方向为虚拟化, 云计算, 区块链.E-mail: chenjh919@zju.edu.cn

通讯作者:
黄步添浙江大学计算机科学与技术学院博士研究生.主要研究方向为虚拟化, 云计算, 区块链.本文通信作者, E-mail:butine@zju.edu.cn

计量
- 文章访问数: 2095
- HTML全文浏览量: 527
- PDF下载量: 1271
- 被引次数: 0
出版历程
- 收稿日期: 2016-09-14
- 录用日期: 2017-02-03
- 刊出日期: 2017-09-20

Towards Automatic Smart-contract Codes Classification by Means of Word Embedding Model and Transaction Information

HUANG Bu-Tian^{1,2
, ,},
LIU Qi^3
,,
HE Qin-Ming^1
,,
LIU Zhen-Guang^3
,,
CHEN Jian-Hai^1
,

1.
College of Computer Science and Technology, Zhejiang University, Hangzhou 310007, China
2.
Yunxiang Network Corporation Hangzhou 310012, China
3.
National University of Singapore Singapore 119613, Singapore

More Information

Author Bio:
    Master student at the College of Computer Science, National University of Singapore, Singapore. His research interest covers data mining and blockchain

    Professor at the College of Computer Science and Technology, Zhejiang University. His research interest covers data mining, virtualization, and blockchain

    Postdoctor at the College of Computer Science, National University of Singapore, Singapore. His research interest covers data mining and blockchain

    Lecturer at the College of Computer Science and Technology, Zhejiang University. His research interest covers virtualization, cloud computing, and blockchain

Corresponding author: HUANG Bu-Tian Ph. D. candidate at the College of Computer Science and Technology, Zhejiang University. His research interest covers virtualization, cloud computing, and blockchain. Corresponding author of this paper, E-mail:butine@zju.edu.cn

摘要

摘要: 作为区块链技术的一个突破性扩展，智能合约允许用户在区块链上实现个性化的代码逻辑从而使得区块链技术更加的简单易用.在智能合约代码信息迅速增长的背景下，如何管理和组织海量智能合约代码变得更具挑战性.基于人工智能技术的代码分类系统能根据代码的文本信息自动分门别类，从而更好地帮助人们管理和组织代码的信息.本文以Ethereum平台上的智能合约为例，鉴于词嵌入模型可以捕获代码的语义信息，提出一种基于词嵌入模型的智能合约分类系统.另外，每一个智能合约都关联着一系列交易，我们又通过智能合约的交易信息来更深入地了解智能合约的逻辑行为.据我们所知，本文是对智能合约代码自动分类问题的首次研究尝试.测试结果显示该系统具有较为令人满意的分类性能.
- 智能合约 /
- 代码 /
- 交易信息 /
- 词嵌入模型 /
- 神经网络 /
- 长短时记忆模型
Abstract: As an innovative extension of the blockchain technology, smart contract enables users to implement personalized logic. As such, blockchain technology becomes more simple and useful. However, due to the rapid increase of the amount of smart contract codes, managing smart contract codes is becoming much more challenging. Automatic code classifier, which rests on the machine learning methods, can automatically identify the categories of the codes so as to saves a lot of human efforts. In this paper we investigate the smart contract codes of the Ethereum platform and propose a novel smart contract code classifier. To the best of our knowledge, this is the first exploration on automatic classification of the smart contract codes. The classifier is based on the word embedding model. Since each smart contract corresponds to a series of transactions, we further utilize the transactions in the contract to understand the intrinsic logic of the contract. Extensive experiments have verified the effectiveness of our proposed system.
- Smart contract /
- codes /
- transaction information /
- word embedding /
- neural network /
- long-short term memory
注释:

1) 本文责任编委袁勇

HTML全文

图 1 Ethereum区块链

Fig. 1 Ethereum blockchain

下载: 全尺寸图片幻灯片

图 2 系统框架

Fig. 2 System architecture

下载: 全尺寸图片幻灯片

图 3 LSTM单元

Fig. 3 LSTM unit

下载: 全尺寸图片幻灯片

图 4 标记流程

Fig. 4 Mark process

下载: 全尺寸图片幻灯片

图 5 类别统计

Fig. 5 Category statistics

下载: 全尺寸图片幻灯片

表 1 神经网络分类效果

Table 1 Neural network classification effect

类别	有交易信息				无交易信息
类别	Precision	Recall	Accuracy	F1 score	Precision	Recall	Accuracy	F1 score
金融类	0.943	0.945	0.942	0.943	0.872	0.868	0.882	0.869
游戏类	0.924	0.897	0.924	0.910	0.895	0.874	0.886	0.884
彩票类	0.882	0.891	0.906	0.886	0.835	0.852	0.875	0.843
Ethereum工具类	0.914	0.921	0.929	0.917	0.854	0.871	0.882	0.862
信息管理类	0.862	0.842	0.883	0.852	0.805	0.813	0.829	0.809
货币类	0.914	0.882	0.917	0.898	0.821	0.809	0.834	0.814
娱乐类	0.873	0.889	0.893	0.881	0.783	0.763	0.792	0.773
物联网类	0.861	0.845	0.882	0.853	0.796	0.771	0.809	0.783
其他	0.832	0.814	0.845	0.823	0.753	0.757	0.791	0.754

下载: 导出CSV

表 2 朴素贝叶斯分类效果

Table 2 Naive Bayesian classification effect

类别	有交易信息				无交易信息
类别	Precision	Recall	Accuracy	F1 score	Precision	Recall	Accuracy	F1 score
金融类	0.862	0.893	0.861	0.877	0.861	0.815	0.862	0.837
游戏类	0.866	0.879	0.883	0.872	0.815	0.826	0.837	0.820
彩票类	0.821	0.817	0.846	0.819	0.796	0.805	0.822	0.800
Ethereum工具类	0.884	0.854	0.896	0.868	0.825	0.847	0.861	0.835
信息管理类	0.829	0.859	0.860	0.852	0.757	0.771	0.796	0.764
货币类	0.876	0.853	0.896	0.864	0.760	0.765	0.774	0.762
娱乐类	0.845	0.864	0.872	0.854	0.716	0.725	0.735	0.720
物联网类	0.826	0.843	0.862	0.834	0.746	0.741	0.759	0.743
其他	0.784	0.819	0.825	0.801	0.745	0.737	0.763	0.740

下载: 导出CSV

表 3 支持向量机分类效果

Table 3 Support vector machine classification effect

类别	有交易信息				无交易信息
类别	Precision	Recall	Accuracy	F1 score	Precision	Recall	Accuracy	F1 score
金融类	0.875	0.897	0.906	0.885	0.815	0.831	0.842	0.822
游戏类	0.883	0.835	0.876	0.858	0.845	0.821	0.856	0.832
彩票类	0.879	0.846	0.887	0.862	0.855	0.793	0.814	0.822
Ethereum工具类	0.861	0.865	0.891	0.862	0.829	0.827	0.836	0.827
信息管理类	0.804	0.863	0.877	0.832	0.764	0.786	0.789	0.774
货币类	0.872	0.862	0.889	0.866	0.787	0.792	0.803	0.789
娱乐类	0.863	0.859	0.873	0.860	0.708	0.714	0.726	0.710
物联网类	0.829	0.845	0.867	0.836	0.756	0.758	0.763	0.756
其他	0.804	0.821	0.856	0.812	0.731	0.727	0.734	0.728

下载: 导出CSV

参考文献(41)

[1]	Nakamoto S. Bitcoin: a peer-to-peer electronic cash system, http://www.bitcoin.org, September 7, 2017
[2]	Castro M, Liskov B. Practical byzantine fault tolerance. In: Proceedings of the Third Symposium on Operating Systems Design and Implementation (OSDI), USENIX Association, 1999, 99: 173-186
[3]	Pang G S, Jin H D, Jiang S Y. Cenknn: a scalable and effective text classifier. Data Mining and Knowledge Discovery, 2015, 29(3): 593-625 doi: 10.1007/s10618-014-0358-x
[4]	Tang B, He H B, Baggenstoss P M, Kay S. A Bayesian classification approach using class-specific features for text categorization. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(6): 1602-1606 doi: 10.1109/TKDE.2016.2522427
[5]	Wahiba B A, El Fadhl Ahmed B. New fuzzy decision tree model for text classification. In: Proceedings of the 1st International Conference on Advanced Intelligent System and Informatics (AISI2015). Switzerland: Springer, 2016. 309-320
[6]	Mikolov T, Sutskever I, Chen K, Corrado G S, Dean J. Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems 26. Lake Tahoe, Nevada, United States: Curran Associates Inc., 2013. 3111-3119
[7]	Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv: 1409.0473, 2014.
[8]	Liu B. Sentiment analysis and opinion mining. Synthesis Lectures on Human Language Technologies, 2012, 5(1): 1-167
[9]	Fleder M, Kester M S, Pillai S. Bitcoin transaction graph analysis. arXiv preprint arXiv: 1502.01657, 2015.
[10]	Ron D, Shamir A. Quantitative analysis of the full bitcoin transaction graph. In: Proceedings of the 17th International Conference on Financial Cryptography and Data Security. Okinawa, Japan: Springer, 2013. 6-24
[11]	Shah D, Zhang K. Bayesian regression and bitcoin. In: Proceedings of the 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton). Monticello, USA: IEEE, 2014. 409-414
[12]	Luu L, Chu D H, Olickel H, Saxena P, Hobor A. Making smart contracts smarter. Cryptology ePrint Archive, Report 2016/633 [Online], available: http://eprint.iacr.org/2016/633, August 16, 2016.
[13]	Moore T, Christin N. Beware the middleman: empirical analysis of bitcoin-exchange risk. In: Proceedings of the 17th International Conference on Financial Cryptography and Data Security. Okinawa, Japan: Springer, 2013. 25-33
[14]	Omohundro S. Cryptocurrencies, smart contracts, and artificial intelligence. AI Matters, 2014, 1(2): 19-21 doi: 10.1145/2685328
[15]	Di Battista G, Di Donato V, Patrignani M, Pizzonia M, Roselli V, Tamassia R. Bitconeview: visualization of flows in the bitcoin transaction graph. In: Proceedings of the 2015 IEEE Symposium on Visualization for Cyber Security (VizSec). Chicago, USA: IEEE, 2015. 1-8
[16]	Sebastiani F. Machine learning in automated text categorization. ACM Computing Surveys (CSUR), 2002, 34(1): 1-47 doi: 10.1145/505282.505283
[17]	Rocchio J J. Relevance feedback in information retrieval. The SMART Retrieval System. Englewood Cliffs, N.J.: Prentice Hall, Inc., 1971.
[18]	Rao Y H, Li Q, Mao X D, Liu W Y. Sentiment topic models for social emotion mining. Information Sciences, 2014, 266: 90-100 doi: 10.1016/j.ins.2013.12.059
[19]	Rao Y H, Xie H R, Li J, Jin F M, Wang F L, Li Q. Social emotion classification of short text via topic-level maximum entropy model. Information & Management, 2016, 53(8): 978-986
[20]	Salton G, Wong A, Yang C S. A vector space model for automatic indexing. Communications of the ACM, 1975, 18(11): 613-620 doi: 10.1145/361219.361220
[21]	Liu M Y, Yang J G. An improvement of TFIDF weighting in text categorization. In: Proceedings of the 2012 International Conference on Computer Technology and Science. Singapore: IACSIT Press, 2012. 44-47
[22]	Li C H, Park S C. Combination of modified BPNN algorithms and an efficient feature selection method for text categorization. Information Processing and Management, 2009, 45(3): 329-340 doi: 10.1016/j.ipm.2008.09.004
[23]	Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks. Science, 2006, 313(5786): 504-507 doi: 10.1126/science.1127647
[24]	Chen Z H, Ni C W, Murphey Y L. Neural network approaches for text document categorization. In: Proceedings of the 2006 IEEE International Joint Conference on Neural Network. Vancouver, Canada: IEEE, 2006. 1054-1060
[25]	Li C H, Song W, Park S C. An automatically constructed thesaurus for neural network based document categorization. Expert Systems with Applications, 2009, 36(8): 10969-10975 doi: 10.1016/j.eswa.2009.02.006
[26]	Turian J, Ratinov L, Bengio Y. Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics, 2010. 384-394
[27]	Pennington J, Socher R, Manning C D. GloVe: global vectors for word representation. In: Proceedings of the Empiricial Methods in Natural Language Processing, 2014, 12: 1532-1543
[28]	Le Q V, Mikolov T. Distributed representations of sentences and documents. In: Proceedings of the 31th International Conference on Machine Learning. Beijing, China, 2014. 1188-1196
[29]	Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8): 1735-1780 doi: 10.1162/neco.1997.9.8.1735
[30]	Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems 27. Montreal, Quebec, Canada: MIT Press, 2014.
[31]	Tim R, Grefenstette E, Hermann K M, Tomáš K, Blunsom P. Reasoning about entailment with neural attention. arXiv preprint arXiv: 1509.06664, 2015.
[32]	Huang P S, He X D, Gao J F, Deng L, Acero A, Heck L. Learning deep structured semantic models for web search using clickthrough data. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management. New York, NY, USA: ACM, 2013. 2333-2338
[33]	Mikolov T, Karafiát M, Burget L, Černocký J, Khudanpur S. Recurrent neural network based language model. In: INTERSPEECH 2010, Conference of the International Speech Communication Association. Makuhari, Chiba, Japan: ISCA, 2010. 1045-1048
[34]	Siegelmann H T, Sontag E D. On the computational power of neural nets. In: Proceedings of the 5th Annual Workshop on Computational Learning Theory. New York, NY, USA: ACM, 1992. 440-449
[35]	Buterin V. Ethereum white paper [online], available: https://github.com/ethereum/wiki/wiki/White-Paper, September 7, 2017
[36]	Wood G. Ethereum: a secure decentralised generalised transaction ledger. Ethereum Project Yellow Paper, 2014.
[37]	Maas A L, Hannun A Y, Ng A Y. Rectifier nonlinearities improve neural network acoustic models. In: Proceeding of the 2013 ICML Workshop on Deep Learning for Audio, Speech, and Language Processing. Atlanta, Georgia, 2013.
[38]	Srivastava N, Hinton G E, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 2014, 15(1): 1929-1958 http://jmlr.csail.mit.edu/papers/v15/srivastava14a.html
[39]	Goodfellow I J, Warde-Farley D, Mirza M, Courville A C, Bengio Y. Maxout networks. ICML, 2013, 28(3): 1319-1327 http://jmlr.org/proceedings/papers/v28/goodfellow13.html
[40]	Duchi J, Hazan E, Singer Y. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 2011, 12: 2121-2159 http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf
[41]	Zeiler M D. Adadelta: an adaptive learning rate method. arXiv preprint arXiv: 1212.5701, 2012.

施引文献

资源附件(0)

访问统计

点击查看大图

图(5) / 表(3)

计量

文章访问数: 2095
HTML全文浏览量: 527
PDF下载量: 1271
被引次数: 0

姓名
邮箱
手机号码
标题
留言内容
验证码

留言板

基于语义嵌入模型与交易信息的智能合约自动分类系统

doi: 10.16383/j.aas.2017.c160655

通讯作者:
黄步添浙江大学计算机科学与技术学院博士研究生.主要研究方向为虚拟化, 云计算, 区块链.本文通信作者, E-mail:butine@zju.edu.cn

计量

Towards Automatic Smart-contract Codes Classification by Means of Word Embedding Model and Transaction Information

Corresponding author: HUANG Bu-Tian Ph. D. candidate at the College of Computer Science and Technology, Zhejiang University. His research interest covers virtualization, cloud computing, and blockchain. Corresponding author of this paper, E-mail:butine@zju.edu.cn

计量

目录

留言板

基于语义嵌入模型与交易信息的智能合约自动分类系统

doi: 10.16383/j.aas.2017.c160655

通讯作者: 黄步添 浙江大学计算机科学与技术学院博士研究生.主要研究方向为虚拟化, 云计算, 区块链.本文通信作者, E-mail:butine@zju.edu.cn

计量

出版历程

Towards Automatic Smart-contract Codes Classification by Means of Word Embedding Model and Transaction Information

Corresponding author: HUANG Bu-Tian Ph. D. candidate at the College of Computer Science and Technology, Zhejiang University. His research interest covers virtualization, cloud computing, and blockchain. Corresponding author of this paper, E-mail:butine@zju.edu.cn

计量

出版历程

目录

通讯作者:
黄步添浙江大学计算机科学与技术学院博士研究生.主要研究方向为虚拟化, 云计算, 区块链.本文通信作者, E-mail:butine@zju.edu.cn