网摘20080629 « Hunter的大杂烩

2008-06-29

网摘20080629

Filed under: 架构 — hunter @ 9:54 pm

最近搜到一些不错的资料，随手做些读书笔记

http://www.zefhemel.com/archives/2004/09/01/the-share-nothing-architecture
Scalability is “the capability of a system to increase performance under an increased load when resources (typically hardware) are added.” (source: Wikipedia)
第一段解释伸缩性是什么意思，伸缩性不等同性能，当你的服务器投入是两倍的时候，
性能能提升两倍，当你的服务器在增加的时候，业务不停顿… 这才是伸缩性

A big scalability problem with caching data is called the cache-coherence problem.
第二段解释一个严重影响伸缩性方面的问题，cache一致性

第三段推销shared-nothing架构：至少在web server上什么数据也不要共享，session
可以通过文件（NFS远程访问）或者数据库来集中维护，而数据库有较好的扩展性(ebay
就是这样做的）

考虑到这是一个04年的帖子，也就不说什么了，当年这位仁兄的经验还是比较肤浅的。

=====================================
http://wutaoo.javaeye.com/blog/148369
from:http://highscalability.com/sharding-hibernate-way
Shard: splitting up your data sets. If your data doesn’t fit on one machine you split it up into pieces and each piece is called a shard.

Sharding: the process of splitting up data.
Shards is for situations where you have too much data to fit in a single database. MySQL partitioning may allow you to delay when you need to shard, but it is still a single database and you’ll eventually run into limits.

shard与分区的区别在于，分区是在单db中进行，而shard是一种数据划分思想，更多体现在将数据分布在多台db中

Control over how data are distributed is determined by a pluggable strategies layer.
— hibernate的实现方式

Plan for the future by picking a strategy that will last you a long time.

Repartitioning/resharding the data is operationally very difficult. No management tools for this yet

— 对shard的策略选择非常重要，后悔药吃起来很难受

后面一堆都是介绍这个策略层的限制和功能

=====================================
http://www.mysqlperformanceblog.com/2008/03/14/sharding-and-time-base-partitioning/
However this may be not the most optimal approach by itself because not all data belonging to same user is equal.
— 介绍shard模型并不适用于所有类型的data,尤其数据的重要性或者热的程度不一

— 作者建议某些数据，可以在shard的基础上，再基于时间维度或者热点维度进行分区
— 作者以为用了shard之后，就不需要master-slave架构了，其实在大的系统中，还是需要用master-slave架构提升单节点的健壮度的，1 master : n slave架构还可以提升单节点的最大利用率，通过把非关键业务的重型查询语句部署在其中一台slave上，也可以避免非关键业务对关键业务的影响；

http://mysqldba.blogspot.com/2006/11/unorthodox-approach-to-database-design.html
— friendster架构
— Each 64bit AMD server would house 500K distinct users and all their data
— 以用户为核心，存储他所有的相关信息
— 提到replication的几个问题，其中”IO bandwidth is low replication lags, causing slave lag”是我们最需要考量的

=====================================
http://highscalability.com/unorthodox-approach-database-design-coming-shard
介绍了shard的优势
a. High availability

— shard提供的额外优势就是可以提供部分服务
b. Faster queries.
c. More write bandwidth
d. You can do more work

— 并发吞吐增加
shard对比传统架构的不同：

a.Data are denormalized

   — 非规范化设计，相同主key的数据存储在一起（第一次见到这种论述）
   — You can keep a user’s profile data separate from their comments, blogs, email, media, etc, but the user profile data would be stored and retrieved as a whole
b.Data are parallelized across many physical instances
c.Data are kept small
d.Data are more highly available
   — You can also setup a shard to have a master-slave or dual master relationship within the shard to avoid a single point of failure within the shard
e.It doesn’t use replication

— 数据切分或者传递用非replication方式，避免对sharding的误会
纯粹用replication进行scaling，是会有写瓶颈的（在livejournal

就在这上面有过痛苦的经验）

sharding的问题
1. Rebalancing data

— 如果某些用户的数据过于肥大，需要重新平衡各个节点的数据量，

这是一个痛苦的过程（google有自调整功能），flickr有全局

name service来定位数据位置，对于传统按模路由的架构，

        这个问题是迟早要解决的
    — And your references must be invalidateable so the underlying data can be moved while you are using it.
    — 对于这个，我的理解是每个数据源还有相关的引用数据（比如好友的nick），这些引用数据可以降低对主（基础）数据的依赖，这样即使基础数据在迁移，某些业务还能继续使用。
2. Joining data from multiple shards
   You have to make individual requests to your data sources, get all the responses, and the build the page
   — amazon用了一种并行查询机制来提升查询效率，这方面值得我们学习
3. How do you partition your data in shards?
— Unfortunately there are no easy answer to these questions.
— 确实，每个业务都不是完全一样的，需要根据你自己的业务去衡量
4. Less leverage
— 较少文献介绍这方面的知识，大部分时候“you are on your own”
5. Implementing shards is not well supported
— 路要靠自己走，工具要靠自己做

=====================================

http://blog.gigaspaces.com//2007/04/06/shared-nothing-architecture-redefined/
— Today we have better ways to remove data dependency, without putting the data into a shared file system — which may eventually become a bottleneck. We partition it and store it in-memory
— giga经常吹嘘自己的space-base-architecture，本质上似乎与sna差不多，

http://en.wikipedia.org/wiki/Shard
–database sharding is a method for database partitioning which involves partitioning

across multiple servers in a shared nothing architecture.
–shard名词似乎是来自MMOG

Hunter的大杂烩技术学习笔记

2008-06-29

网摘20080629

No Comments

Hunter的大杂烩 技术学习笔记

2008-06-29

网摘20080629

No Comments

Hunter的大杂烩技术学习笔记