An application needs to maintain a set of unique customer IDs (type String) and frequently check if an ID is already present. The set is expected to contain up to 100,000 IDs. The current implementation uses a TreeSet, but performance tests show that the contains() operation is slower than desired. The developer considers switching to a HashSet. However, the business requires that when iterating the set, IDs must appear in sorted order. The developer proposes to convert the HashSet to a sorted list each time iteration is needed. Iteration occurs rarely (once per hour). What is the best approach?
Fast contains() and sorting once per hour is fine.
Why this answer
Option D is correct because the primary performance bottleneck is the contains() operation, which is O(log n) for TreeSet but O(1) average for HashSet. Since iteration with sorted order is required only once per hour, the cost of sorting the HashSet into a list (O(n log n)) is negligible compared to the frequent contains() checks. This trade-off optimizes for the dominant use case while still meeting the sorted iteration requirement.
Exam trap
The trap here is that candidates often assume sorted iteration must be maintained at all times, overlooking the fact that the requirement is only for rare iteration, making the conversion cost acceptable in exchange for faster contains().
How to eliminate wrong answers
Option A is wrong because ConcurrentSkipListSet is a thread-safe sorted set with O(log n) contains() performance, which does not improve over TreeSet and adds unnecessary concurrency overhead for a single-threaded scenario. Option B is wrong because LinkedHashSet maintains insertion order, not sorted order, so iterating it does not produce IDs in sorted order. Option C is wrong because keeping TreeSet retains the slower O(log n) contains() performance, which is the exact problem the developer is trying to solve.