How to Prepare the Best Keyword Database on Market?
Build up mass
When we were faced with the task of building up our keyword base, which is, no doubt, is the most important element of any service like ours, we've decided to think big. From most of the available sources, such as Yandex' autocompletion prompts, data on Metrica's counters, which then still was open, and data from Liveinternet's counters, we've collected approximately 1.5 billions of keywords. We've checked Wordstat search volume for all those keywords and selected 150,000,000 most popular keywords from their number. When we've sorted the entire 1.5 billion by the "Search volume" column, the row #150,000,000 had 4 impressions per month. We've decided that this is the very page we were searching for.
Unique filtration and merging algorithms
We have developed an algorithm for highlighting rephrased keywords and keywords that include the same words in a different order. After long and hard experimentation with these approaches to cleaning up our database we got what we believed were good results. Using this approach we've selected the most unique and most often searched for keywords, overall a little more than 70,000,000 phrases, which became our starting point. But time to change has come.
After all, our forte is nothing else than spying and competition!
Some time ago our service was periodically criticized for lacking "those very special commercial keywords". To minimize such criticisms, we've tried to get as many keywords as possible from competing services, and added resulting masses of keywords (for each of our competitors we estimate them as 80-85% of its database) to our original database of 150 millions of keywords. After merging of our database with those of our competitors, amount of keywords did not change greatly - about 15 more millions were added.

Then we deleted keywords with repeated words and unacceptable characters, as well as all those longer than 7 words – data on rates for such long keywords cannot be retrieved from Yandex.Direct.
Yandex' new operator
Yandex gave us a gift by adding a new keyword operator to – square brackets. This operator makes word order fixed, so now we can finally know for sure that the "ticket Moscow Petersburg" keyword is noticeably more popular than "ticket Petersburg Moscow". The entire Russian Internet waited for such an operator and we could not miss the event. We checked the search volume for approximately 160 millions of our remaining keywords in the new format.

By combining the new operator that makes word order fixed with old "Quotation marks" and "Exclamation point" operators we finally could obtain certain knowledge about actual number of impressions for a specific keyword in a specific form. All our know-hows for selecting more likely forms became totally unnecessary, to much rejoicing on our part. We've sorted our 160 millions of keywords by "[!yearly !search !volume !in !the !new !format]", and selected 80 millions of keywords, which became the foundation of database for the Yandex: Moscow region.

For additional regions ( Petersburg, Ekaterinburg, Minsk, Kiev, Rostov-on-Don, Nizhny Novgorod, Krasnodar) we've taken top keywords of our original base, leaving quantities of keywords that we can process in a timely manner. For Petersburg, Minsk and Kiev this number equals 20,000,000, for other regions more than 13,000,000 of keywords. Precise data is always available on our system statistics page.