发信人: fervvac (高远), 信区: DataMining
标 题: Re: 几个数据仓库问题
发信站: 南京大学小百合站 (Mon Apr 15 10:27:10 2002), 站内信件
I didn't read that book, just some quick thoughts.
1. I didn't know why they are relating OODB to DW. My personal opinion about
DW is that perfromance is the key issue (still not well solved). OO is
notoriously known for its low performance and, above all, many of its
optimizations cannot be applied to DW cases. In fact, many relational
techniques cannot be directly applied either. Personally speaking, this is
exactly the most interesting part of research into this field.
2. Dimension modelling is a complex, yet important issue in DW deployment.
However, from the research point of view, it has not received much
attention. My guess is that it is NOT an easy issue.
For your question, practitioners might have more authoratative solutions.
My idea is that sometimes, multiple dimensions are required, if they
occupy the whole domain in different fashions. For example, time dimension
by month and by week. SQL 2000 has a feature called "virtual dimension",
said to be esp. good for such cases. On the other hand, if they are
disjoint, maybe it is suitable to model them as one single dimension. For
example, Product has two categories (or subclasses, in OO term), food and
beverage, each has its own attributes. If congee is only taken as one
category but not both (:p), you can have a "big" relation as (pid, ...),
where ... is the union of all the attributes from both categories.
This solution will, of course, leave a lot of "NULL" value in the
dimension table. However, given the huge size of the fact table, that's
not a big deal.
From another point of view, star schema dictates an de-normalized form
in its own right. So those NULL values, traditionally considered
toublesome, are not of much concern now.
However, dimension modelling is extremely complicated, and I am not sure
whether my argument is appropriate or not.
【 在 Axiao (阿肖期待涅磐中) 的大作中提到: 】
:
: 我看《面向对象数据仓库设计》一书(人民邮电出版社),
: 有几个问题不解:
: 1。
: 在书中给出了所谓的对象模型,里面有各种对象及其关系,如父类和子类的继承关系。但
: 是在具体给出对象属性的时候,却只有父类及其的属性而没有给出子类及其属性,只是在
: 父类中有一个标志属性,用不同的值表示是哪一个子类。
: 我觉得这样的做法叫做父类和子类很牵强,如果不同的子类有各自的一些特别的属性,显
: 然应该单独列出子类来,如果子类没有一些特别的属性,仅仅是可以作为父类的一些类型
: 区分的话,那又何必要把它说成是父类和子类的关系呢
: 2。
: 星型架构中,维表和对象模型中的对象相对应(那本书上说的),但书上的例子,比如产
: 品,就是一个产品维,我想,如果对象模型中产品类中存在多个具有特定属性的产品子类
: ,维表应该多个还是一个。如果是一个维表,那么这一维表中必须包括所有子类的特定属
: 性,这样对于任何一个维表记录肯定会有很多属性为空(因为具体的产品子类不会有其他
: 产品子类特定的属性)岂不是浪费空;如果多个维表,那么由于事实表(假设为销售事实
: 表)重要包括所有的维表关键字,则对于某个事实表记录而言,由于销售的仅是某一件(
: 子)产品,那么除了这件子产品对赢得产品维表的关键字的值,应该包括其它产品维表的
: 哪一个关键字值呢?
: 我对数据仓库是初学,也许理解不准确,请高手指点,多谢!!!!
: (以下引言省略 ... ...)
--
※ 来源:.南京大学小百合站 bbs.nju.edu.cn.[FROM: 饮水思源BBS]