執美國信用卡到其它國家消費是要另外付跨國交易手續費(foreign transaction fee)的，這應該是常識。這費用通常為0%~3%不等，包括Visa/Master/AmEx/Discover會先抽一部份的手續費，以Visa為例，會抽0.15%~1%不等的”International Service Assessment (ISA)”費用(資料來源)，而發卡銀行可能會再另外抽額度不等的手續費。幸好，有些信用卡標榜免跨國交易手續費！
繼續讀下去 2011冬 Vegas & 南加州
繼續讀下去 2012 秋 Southern California
I checked in the 1100th changest in mace-fullcontext this week. Most of the recent changesets were focusing on runtime optimization. The runtime can process nearly 250,000 events per second! This is 10 times faster than 6 month ago. Also, it can process nearly 100,000 network message events per second, which is twice than a month ago. (network message events involves network sockets and serialization, so the throughput is lower than non-network-message events)
My recent task was optimizing a parallel event driven system, called ContextLattice. I’ve done some decent job to make it processes events faster Specifically, the runtime performance went from ~10,000 events/s to ~200,000 events per second*, giving 20x improvement. That’s incredible looking back, so I’d like to share the experience around.
*: data obtained from a 8-core, Intel Xeon 2.0 Ghz machine.
So what did I do? General rule of thumb:
- Use gprof. Early optimization is the root of all evils. Never attempt to optimize something until you run it. gprof makes it clear what functions or data structures take up most of the execution time. There are other performance analyzing tools, but gprof is quite well known and I did not use others.
- Replace global locks with finer-grained ones. In a multi-thread program, a global lock is typically what constrains a system’s performance, and unfortunately, I had several global locks in the original system. If possible, replace a global lock by several non-global locks. Of course, that depends on the program’s semantics.
- Change data structures. Data structures impact performance a lot. Sometimes it’s hard to tell which data structure is the best (what’s the best container for std::queue? std::deque or std::list?**). Just experiment with all candidate data structures. I use C++’s standard library and boost library data structures for most part, which enable easy substitution.
- Use memory pool. If there are a lot of dynamic memory allocation/deallocation, you can reduce those allocation by recycle unused memory blocks. Boost has a memory pool implementation, but I ended up created one myself, because I’ve had no experience with the Boost’s memory pool. What I did was a simple FIFO queue of pointers, and that proves to be useful. I’d like to hear if any one has experience with Boost memory pool.
- Reduce the work in the non-parallelizable code. Ignore those functions that run in parallel with others, as they don’t impact performance a lot. Focus on those code protected by the global lock. Relocate the code to parallelizable functions if possible.
- Reduce unnecessary memory copy. ContextLattice came from Mace. I used Mace APIs for some implementation, but it turned out some of them did not fit very well and I had to make some special object constructors for special purposes to reduce memory copy.
**: 1. std::queue uses std::deque as the internal container by default. I found using std::list as the container is faster, as least in the situation I encountered.
2. Also, std::map is faster than std::tr1::unordered_map(which is a hash map), in the situation I encountered ( A map has only a few items, but there are many of these maps created) Similarily std::set is faster than std::tr1::unordered_set (which is a hash set).
3. std::map has a feature that the iterator iterates the structure in sorted order (for example, if the index type is integer, the iterator iterates from the item with the smallest index value), so you can actually use it as a priority queue. But std::priority_queue is still faster if you only want to access the front of the structure.
There are other optimization experiences specific to ContextLattice. They are also interesting but I can’t explain them without introducing the internals of the system. So I’ll probably find some time to talk about it in details.
Haven’t updated my blog lately. I was mostly busy with a paper submission due a few days ago. Now that paper is submitted, I am ready to prepare for another one.
Time flies, and my happy summer time is less than 1.5 months remaining. I am hoping to finish one paper (expecting to get this done by end of July), and start to work on another, and help my labmates with others.