SIA OpenIR  > 智能产线与系统研究室
制造系统生产调度和机器人学习智能研究
Alternative TitleResearch on Learning Intelligence of Production Scheduling and Robot for Manufacturing System
魏英姿1,2
Department装备制造技术研究室
Thesis Advisor赵明扬
ClassificationTP18
Keyword动态调度 强化学习 机器人 遗传算法 制造系统
Call NumberTP18/W59/2005
Pages107页
Degree Discipline机械电子工程
Degree Name博士
2005-01-28
Degree Grantor中国科学院沈阳自动化研究所
Place of Conferral沈阳
Abstract生产调度是否合理有效以及机器人技术的应用对制造业的技术进步有着重要作用。生产调度属于组合优化NP难题,研究组合优化问题的解决办法,本文提出了贪心遗传算法。贪心遗传算法的思想是根据对局部知识的理解,将贪心策略引入GA的各个遗传操作中,根据对TSP组合优化典型问题的计算,贪心遗传算法能以较少的工作量得到令人满意的解,也为遗传算法运用于动态调度问题奠定基础。考虑调度问题的约束条件:软、硬约束,本文建立资源受限调度一般问题的数学模型,研究多种加工模式并存的资源受限单机动态调度问题,提出了满足约束条件的染色体编码方法以形成GA的初始种群,采用特征保持的遗传操作,如,一点、二点和均匀顺序交叉等,并对其特征保持性给予了证明,运用并行进化机理构造算法的动力学,计算结果显示,这种特征保持的并行GA算法有较好的计算效果和效率,完全可以满足动态调度的需要。作业车间动态调度是最一般的调度类型,模式驱动调度(PDS)是实现动态调度的有效方法。本文在PDS框架下,就问题进行了以下研究:(1)基于agent技术构建作业车间动态调度系统,采用一种新的分布式控制体系结构,通过agent之间的交互投标双向选择完成调度任务分配,提出了3种复合规则作为合同网的谈判策略。(2)提出了复合规则Q学习方法,定义调度过程的中间状态描述变量——紧迫度,并构建了一种精确评价动作好坏的回报函数形式,通过仿真试验验证了该算法的有效性。提高制造单元机器人智能水平对扩展制造系统的生产能力起着重要作用。本文给出机器人技能学习的概念;总结机器人学习的建模方法;总结演示学习和强化学习方式的研究概况;归纳出机器人技能学习目前研究的可行方向。机器人复杂技能强化学习是一类比较困难的学习问题,为此,本文研究各种措施以期解决该难题。考虑到回报函数对强化学习系统的关键性作用,设计了一种启发式回报函数形式,并对其最优策略不变性和Q值迭代收敛性给予了证明,将输入状态空间进行多尺度离散化,运用CMAC神经网络函数近似,实行多种行为选择策略、分层递阶的学习策略,并通过学习控制自行车的仿真试验验证了上述技能学习方案的有效性。本文在先进制造系统的学习智能所作研究,对先进制造系统的技术进步起到促进作用,不仅为问题提供了新方法和手段,也为智能学习理论拓展了应用领域。
Other AbstractBoth effective production scheduling and application of robot technology are significant for the development of manufacturing system.Chapter 2 presents a novel greedy genetic algorithm (GGA) for a typical combination optimization problem, i.e., Traveling Salesman Problem (TSP). The main idea of GGA is to introduce the greedy selection into the genetic operations. This work shows how greedy policy and genetic algorithm can be usefully combined. Initial experiments demonstrats the basic promise of the approach. When solving the resource-constrained dynamic scheduling problem subject to constraints, a mathematic model is built up for it. The parallel genetic algorithm (PGA) with satisfaction of constraints is proposed. PGA adopts the permutation-based coding of activity sequence with satisfaction of priority requirements. Crossover operators are customized by the research project. It is proved that the crossover operators result in a precedence feasible offspring genotype if applied to precedence feasible parent individuals. Single-machine preemptive scheduling will improve the performance of scheduling system. Pattern driven scheduling (PDS) is an effective way to realize the dynamic scheduling. Under the framework of PDS, chapter 4 discusses the following problems: (1) Heterarchical scheduling architecture is adopted to solve the multi-agent cooperative problem by using interactive bidding mechanism. Negotiations among different agents formed a complete scheduling. The negotiation strategy of contract-net protocol is based on 3 composite rules. The interactive selection of agents is achieved by implementing these composite rules. (2) Composite rules selection using reinforcement learning (RL) is proposed to realize job-shop dynamic scheduling. An intermediate-state variable is defined, pressure, to describe the system feature and determine the state sequence of search space. The conception of jobs’ estimated mean lateness (EMLT) is used to determine the amount of reward or penalty. It is important for robots to enhance their intelligence in manufacturing cell. Chapter 5 introduces the conception of skill learning, summarizes the methods for modeling skills, and has an overview of research on learning by demonstration and reinforcement learning in this area. The direction for robots skill learning is also deduced. Complex skill learning using RL is a difficult problem, so effective strategies are introduced for solving it. The reinforcement function has become the critical component for its effect of evaluating the action and guiding the learning process. Chapter 6 presents a form of heuristic reward function . Under a more general model of MDP, the policy invariance and convergence property of Q-value iteration are proved. Automatic robot shaping policy is to dissolve the complex skill into a hierarchical learning process. Variable resolution discretization of input space is introduced to improve the generalization capability of CMAC-based RL. Boltzmann distribution selection is also introduced into ε-greedy search procedure to decrease the unnecessary randomization. An example illustrates the utility of method for learning skilled robot control on line. The research work on learning intelligence of manufacturing system accelerates the advance of manufacturing technology. The dissertation not only put forward new methods for manufacturing system, but also explores learning intelligence theory to new fields.
Language中文
Contribution Rank1
Document Type学位论文
Identifierhttp://ir.sia.cn/handle/173321/9436
Collection智能产线与系统研究室
Affiliation1.中国科学院沈阳自动化研究所
2.中国科学院研究生院
Recommended Citation
GB/T 7714
魏英姿. 制造系统生产调度和机器人学习智能研究[D]. 沈阳. 中国科学院沈阳自动化研究所,2005.
Files in This Item:
File Name/Size DocType Version Access License
制造系统生产调度和机器人学习智能研究.p(835KB) 开放获取CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[魏英姿]'s Articles
Baidu academic
Similar articles in Baidu academic
[魏英姿]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[魏英姿]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.