[美國]免跨國交易手續費的信用卡

執美國信用卡到其它國家消費是要另外付跨國交易手續費(foreign transaction fee)的,這應該是常識。這費用通常為0%~3%不等,包括Visa/Master/AmEx/Discover會先抽一部份的手續費,以Visa為例,會抽0.15%~1%不等的”International Service Assessment (ISA)”費用(資料來源),而發卡銀行可能會再另外抽額度不等的手續費。幸好,有些信用卡標榜免跨國交易手續費!

到處亂逛網路時發現已經有人整理好美國36張免跨國交易手續費的信用卡[英文網站]。這麼好康的東西當然就直接分享啦。資料是更新於2012年7月,或許有些已經不一樣了,請斟酌使用。

另外,Chase的網站上也有提供幾張免跨國交易手續費的信用卡

最後,Purdue的同學們,PEFCU的信用卡/Debit卡的跨國交易手續費是0.8%/1.0%─如果交易是美元計價就是0.8%,如果是非美元則是1.0%。資料來源

在美國吃素

分享一下如何在美國方便地吃到素食。

對於素食者來說,不論是要留學、或者在美國自助旅行都很實用的經驗! 首先,如果是在加州的話,基本上這篇文章可以直接忽略,因為在加洲各大城市華人都很多,很容易找到素食中國菜、甚至是素食台菜。

這篇文章是給住在中西部、方圓百里沒有間像樣的中國菜的朋友們的。

繼續讀下去 在美國吃素

1100th Changeset in Mace-fullcontext

I checked in the 1100th changest in mace-fullcontext this week. Most of the recent changesets were focusing on runtime optimization. The runtime can process nearly 250,000 events per second! This is 10 times faster than 6 month ago. Also, it can process nearly 100,000 network message events per second, which is twice than a month ago. (network message events involves network sockets and serialization, so the throughput is lower than non-network-message events)

Optimization (My Experience)

My recent task was optimizing a parallel event driven system, called ContextLattice. I’ve done some decent job to make it processes events faster Specifically, the runtime performance went from ~10,000 events/s to ~200,000 events per second*, giving 20x improvement. That’s incredible looking back, so I’d like to share the experience around.

*: data obtained from a 8-core, Intel Xeon 2.0 Ghz machine.

So what did I do? General rule of thumb:

  1. Use gprof. Early optimization is the root of all evils. Never attempt to optimize something until you run it. gprof makes it clear what functions or data structures take up most of the execution time. There are other performance analyzing tools, but gprof is quite well known and I did not use others.
  2. Replace global locks with finer-grained ones. In a multi-thread program, a global lock is typically what constrains a system’s performance, and unfortunately, I had several global locks in the original system. If possible, replace a global lock by several non-global locks. Of course, that depends on the program’s semantics.
  3. Change data structures. Data structures impact performance a lot. Sometimes it’s hard to tell which data structure is the best (what’s the best container for std::queue? std::deque or std::list?**). Just experiment with all candidate data structures. I use C++’s standard library and boost library data structures for most part, which enable easy substitution.
  4. Use memory pool. If there are a lot of dynamic memory allocation/deallocation, you can reduce those allocation by recycle unused memory blocks. Boost has a memory pool implementation, but I ended up created one myself, because I’ve had no experience with the Boost’s memory pool. What I did was a simple FIFO queue of pointers, and that proves to be useful. I’d like to hear if any one has experience with Boost memory pool.
  5. Reduce the work in the non-parallelizable code. Ignore those functions that run in parallel with others, as they don’t impact performance a lot. Focus on those code protected by the global lock. Relocate the code to parallelizable functions if possible.
  6. Reduce unnecessary memory copy. ContextLattice came from Mace. I used Mace APIs for some implementation, but it turned out some of them did not fit very well and I had to make some special object constructors for special purposes to reduce memory copy.

**: 1. std::queue uses std::deque as the internal container by default. I found using std::list as the container is faster, as least in the situation I encountered.

2. Also, std::map is faster than std::tr1::unordered_map(which is a hash map), in the situation I encountered ( A map has only a few items, but there are many of these maps created) Similarily std::set is faster than std::tr1::unordered_set (which is a hash set).

3. std::map has a feature that the iterator iterates the structure in sorted order (for example, if the index type is integer, the iterator iterates from the item with the smallest index value), so you can actually use it as a priority queue. But std::priority_queue is still faster if you only want to access the front of the structure.

There are other optimization experiences specific to ContextLattice. They are also interesting but I can’t explain them without introducing the internals of the system. So I’ll probably find some time to talk about it in details.

June

Haven’t updated my blog lately. I was mostly busy with a paper submission due a few days ago. Now that paper is submitted, I am ready to prepare for another one.

Time flies, and my happy summer time is less than 1.5 months remaining. I am hoping to finish one paper (expecting to get this done by end of July), and start to work on another, and help my labmates with others.