With the rapid development of information and communication technology, the amount of data generated by all walks of life has increased dramatically. Humans have entered the era of big data, and data has become an important asset and competitive resource for the industry. Big data also has a major impact on the power industry. In order to obtain industry gains from massive power data, data processing technologies such as data analysis and data mining have been gradually introduced into the power system. There is often a lot of "dirty data" in the original data generated in the power grid. These "dirty data" will cause difficulties for subsequent data analysis. In order to eliminate the adverse effects caused by "dirty data", it is necessary to preprocess the collected raw data to improve data quality At present, in the actual application of power grid data processing, there is a lack of practical data preprocessing algorithms, and many data processing algorithms are still in the exploration stage. In order to solve the above problems, this paper has designed a data preprocessing algorithm suitable for the power system, including three main processes, through three algorithms to improve data quality, and ultimately can obtain high-quality data. The main work of this article are: 1. For the problem of power grid data quality discrimination, four commonly used data quality index calculation formulas and improvement methods are proposed. In order to determine the effective data processing sequence, a data quality improvement algorithm based on greedy strategy is proposed to effectively improve data quality and reduce Calculate costs and processing time. 2. In order to detect the outliers of power grid data, the one class support vector machine is used to identify outliers, the particle swarm optimization algorithm is improved for parameter optimization, and the one class support vector machine based on the improved particle swarm optimization algorithm is used to detect outliers of power grid data, which improves the accuracy of outliers detection. 3.Aiming at the data protocol of power grid, an adaptive autoencoder algorithm based on influence factor is proposed, which reduces the error of autoencoder, improves the effect of data dimensionality reduction, and finally gets a high-performance data dimensionality reduction model. The above research contents have all been verified by experiments. In view of the above research work, this paper has carried on the verification experiment, and proved the validity of the data preprocessing algorithm proposed in this paper and lay a good foundation for the next step of data analysis and mining.