SIA OpenIR  > 数字工厂研究室
高维数据可视化技术研究与实现
Alternative TitleResearch and implementation of high dimensional data visualization technology
魏世超1,2
Department数字工厂研究室
Thesis Advisor李歆
Keyword高维数据 混合属性数据 降维可视化 可视化推荐 工业应用
Pages74页
Degree Discipline控制工程
Degree Name专业学位硕士
2020-05-26
Degree Grantor中国科学院沈阳自动化研究所
Place of Conferral沈阳
Abstract大数据时代中,数据就像一个巨大的矿藏需要人们去开发挖掘,但这些数据不仅数据量庞大,而且每条数据拥有多个维度,数据之间的关系也十分复杂,不利于人们发现数据中的隐藏价值。数据可视化技术作为数据分析领域的新生儿,通过将数据映射为人类感知较为敏感的颜色和图表,为数据理解、数据挖掘和辅助决策提供了一种有效的途径。研究者们开发了一些针对高维数据的可视化算法和可视化系统帮助人们理解分析数据,但他们都有一定的局限性。比如常见的降维可视化方法基本都是采用距离度量的方式将高维空间中的数据映射到低维空间,这些算法对数据的类型要求较高,对日渐形成的复杂类型数据处理能力大大降低;其次,现有的一些可视化辅助系统智能性较低,对数据的整体分析不到位,造成数据理解片面,生成可视化图形不准确的问题。本文主要围绕高维数据可视化所面临的一些问题展开研究探讨。主要分为三个方面:(1) 降维可视化是针对高维数据分析的有效手段。针对传统的t分布随机近邻嵌入(t-SNE)算法只能处理单一属型数据,对混合属性数据效果欠缺的问题,提出一种扩展的t-SNE降维可视化算法E-t-SNE,用于处理混合属性数据。首先,该方法引入信息熵概念来构建分类属性数据的距离矩阵,其次采用分类属性数据距离与数值属性数据欧式距离相结合的方式构建混合属性数据距离矩阵,最后将新的距离矩阵输入t-SNE算法对数据进行降维并在二维空间可视化展示。此外,为验证算法有效性,采用K近邻(KNN)算法对混合数据降维后的效果进行评价。通过在UCI数据集上的实验表明,该方法在处理混合属性数据方面,不仅具有较好的可视化能力,而且能有效地对不同类别的数据进行降维分簇,提升后续分类器的分类准确率。(2) 对于大多数没有可视化技术专长的人来说,高维数据可视化存在一定的困难。可视化推荐的目标是通过一定的技术手段自动生成供分析者探索和选择的结果以降低可视化障碍。本文提出了一种基于机器学习的可视化推荐方法,该方法从众多可视化实践数据集中学习到最有意义的可视化结果并将其标记。首先从30个真实的可视化数据集中提取22个数据特征以及对应的有意义的可视化类型;然后分别使用二分类器训练分类模型,从中学习到“有意义”的可视化,并使用众包测试集进行准确率测试。最后融合多个分类器结果,投票选出适合数据集的多个“有意的”可视化图表。实验表明,我们的方法能有效地学习到数据集中有意义的可视化类型并将其标记和推荐给用户,大大降低数据探索难度。(3) 根据实际工业生产数据可视化分析要求,介绍工业生产数据可视化的技术基础和设计原则,结合研究内容设计了多个工业生产过程可视化展示界面,设计实现的可视化界面依托数据基础,立足数据分析,结合人类对颜色和图表的直观感觉,帮助企业以及决策者实现了生产过程监测和管理决策,体现了数据可视化在实际生产中的重要性。
Other AbstractIn the era of big data, every piece of data contains great value, but these data not only have a large amount of data, but also each data has multiple dimensions, and the relationship between the data is very complex, which is not conducive to the discovery of hidden value in the data. As a new subject technology, data visualization technology provides a convenient and efficient way for people to understand data, understand data and discover data laws by mapping data to visualization graphics. Researchers have developed some visualization algorithms and visualization systems for high-dimensional data to help people understand and analyze data, but they all have certain limitations. For example, the common methods of dimensionality reduction visualization have higher requirements for data types and greatly reduce the ability to deal with the increasingly complex types of data. Secondly, some existing visualization auxiliary systems have low intelligence and the overall analysis of data is not in place, resulting in data management To solve the problem of inaccurate visualization. This paper focuses on some problems of high-dimensional data visualization. It is mainly divided into three aspects: (1) Dimensionality reduction visualization is an effective method for high dimensional data analysis. Aiming at the problem that the traditional t-SNE algorithm can only deal with single attribute data and can’t handle mixed type data very well. An extended t-SNE dimensionality reduction visualization algorithm named E-t-SNE is proposed. The extension facilitates to handle mixed type data. Firstly, the concept of information entropy is introduced to construct the distance matrix of categorical data. Secondly, the distance matrix of mixed type data is constructed by combining the distance between categorical data and the Euclidean distance of numerical data. Finally, the combined matrix is used into t-SNE algorithm to reduce the dimension and display it in two-dimensional space. In addition, in order to verify the effectiveness of the algorithm, K-Nearest Neighbor (KNN) algorithm is used to evaluate. Experiments on UCI datasets show that this method not only has good visualization ability in dealing with mixed attribute data, but also can effectively reduce the dimension of different classes of data and improve the classification accuracy of subsequent classifiers. (2) For most people who don't have visualization technology expertise, data visualization has some difficulties. The goal of visual recommendation is to automatically generate the results for analysts to explore and select through certain technical means to reduce the obstacles of visualization. This paper proposes a visual recommendation method based on machine learning, which can learn the most meaningful visualization results from many visualization practice datasets and mark them. Firstly, 22 data features and corresponding meaningful visualization types are extracted from 30 real visualization datasets; then, binary classifiers are used to train the classification model, from which we can learn "meaningful" visualization and use crowdsourced testsets to test the accuracy. Finally, the results of multiple classifiers are fused to vote for multiple meaningful charts in the datasets. Experiments show that our method can effectively learn the meaningful visualization types in datasets, mark and recommend them to users. (3) According to the requirement of visualization analysis of actual industrial production data, this paper introduces the technical basis and design principle of visualization of industrial production data, designs several visualization display interfaces of industrial production process in combination with the research content, the visualization interface designed and realized relies on the data basis, bases on data analysis, combines the intuitive feeling of human to color and chart, and helps enterprises and decision makers to realize it It shows the importance of data visualization in actual production.
Language中文
Contribution Rank1
Document Type学位论文
Identifierhttp://ir.sia.cn/handle/173321/27141
Collection数字工厂研究室
Affiliation1.中国科学院沈阳自动化研究所;
2.中国科学院大学
Recommended Citation
GB/T 7714
魏世超. 高维数据可视化技术研究与实现[D]. 沈阳. 中国科学院沈阳自动化研究所,2020.
Files in This Item:
File Name/Size DocType Version Access License
高维数据可视化技术研究与实现.pdf(2449KB)学位论文 开放获取CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[魏世超]'s Articles
Baidu academic
Similar articles in Baidu academic
[魏世超]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[魏世超]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.