SIA OpenIR  > 工业控制网络与系统研究室
Alternative TitleResearch and Implementation of Big Data Preprocessing Algorithm in Power Grid
Thesis Advisor王忠锋
Keyword电网数据 数据质量 数据清洗 数据规约
Degree Discipline机械工程
Degree Name专业学位硕士
Degree Grantor中国科学院沈阳自动化研究所
Place of Conferral沈阳
Abstract随着信息通信技术的迅速发展,各行各业产生的数据量急剧增加,人类迈入大数据时代,数据已经成为行业重要资产和竞争资源。大数据同样对电力行业产生了重大影响,为了获取海量电力数据所带来的行业增益,数据分析、数据挖掘等数据处理技术已逐步引入到电力系统中。电网中产生的原始数据往往存在着很多“脏数据”,这些“脏数据”会为后续数据分析工作带来困难。为了消除“脏数据”带来的不良影响,需要对采集到的原始数据进行数据预处理,以提高数据质量。目前在电网数据处理的实际应用中,缺乏切实可行的数据预处理算法,很多数据处理算法仍旧处于探索阶段。为了解决上述问题,本文设计了适合电力系统的数据预处理算法,包含三个主要流程,通过三个算法来提高数据质量,最终可以得到高质量的数据。本文的主要工作有:1、针对电网数据质量判别问题,提出了四个常用的数据质量指标计算公式 和提升方法,为了确定有效的数据处理顺序提出了一种基于贪心策略的数 据质量提升算法,有效提升数据质量,降低计算成本和处理时间。2、针对电网数据异常值检测,采用单分类支持向量机对数据进行异常值识 别,对粒子群算法进行改进用于参数寻优,并使用基于改进粒子群算法的单 分类支持向量机进行电网数据异常值检测,提高了异常值检测的精度。3、针对电网数据规约,提出了一种基于影响因子的自适应自编码器算法, 降低了自编码器的误差,提高了数据降维效果,最终得到了高性能的数据降 维模型。 针对以上研究工作,本文进行了验证实验,证明了本文所提出的数据预处理算法的有效性,为下一步的数据分析挖掘工作奠定了良好基础。
Other AbstractWith the rapid development of information and communication technology, the amount of data generated by all walks of life has increased dramatically. Humans have entered the era of big data, and data has become an important asset and competitive resource for the industry. Big data also has a major impact on the power industry. In order to obtain industry gains from massive power data, data processing technologies such as data analysis and data mining have been gradually introduced into the power system. There is often a lot of "dirty data" in the original data generated in the power grid. These "dirty data" will cause difficulties for subsequent data analysis. In order to eliminate the adverse effects caused by "dirty data", it is necessary to preprocess the collected raw data to improve data quality At present, in the actual application of power grid data processing, there is a lack of practical data preprocessing algorithms, and many data processing algorithms are still in the exploration stage. In order to solve the above problems, this paper has designed a data preprocessing algorithm suitable for the power system, including three main processes, through three algorithms to improve data quality, and ultimately can obtain high-quality data. The main work of this article are: 1. For the problem of power grid data quality discrimination, four commonly used data quality index calculation formulas and improvement methods are proposed. In order to determine the effective data processing sequence, a data quality improvement algorithm based on greedy strategy is proposed to effectively improve data quality and reduce Calculate costs and processing time. 2. In order to detect the outliers of power grid data, the one class support vector machine is used to identify outliers, the particle swarm optimization algorithm is improved for parameter optimization, and the one class support vector machine based on the improved particle swarm optimization algorithm is used to detect outliers of power grid data, which improves the accuracy of outliers detection. 3.Aiming at the data protocol of power grid, an adaptive autoencoder algorithm based on influence factor is proposed, which reduces the error of autoencoder, improves the effect of data dimensionality reduction, and finally gets a high-performance data dimensionality reduction model. The above research contents have all been verified by experiments. In view of the above research work, this paper has carried on the verification experiment, and proved the validity of the data preprocessing algorithm proposed in this paper and lay a good foundation for the next step of data analysis and mining.
Contribution Rank1
Document Type学位论文
Recommended Citation
GB/T 7714
付亚同. 电网大数据预处理算法研究与实现[D]. 沈阳. 中国科学院沈阳自动化研究所,2020.
Files in This Item:
File Name/Size DocType Version Access License
电网大数据预处理算法研究与实现.pdf(1330KB)学位论文 开放获取CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[付亚同]'s Articles
Baidu academic
Similar articles in Baidu academic
[付亚同]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[付亚同]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.