Overall comments:
I think its a very good entry level talk that explains what goes into query parsing and detection and how its a complex and interesting problem on its own. I have to get a link to the slices and post it here. Meanwhile, here is my raw blog post which I will clean up after the sessions are over.
Details
Amazon search queries are parsed and then run through algorithms to pull up the recommended products based on query and display it.
However the queries are a combination of structured and unstructured information, and also user behavior info from button clicks.
Parsing the query from free text works by tagging the query with keywords like gender, brand, price, prime, shoes etc. And then they query the back end. The tagging is advanced to the extent that query containing c300 is mapped to Mercedes Benz.
The speaker aims to increase precision by filtering and ranking and improve customer experience.
Filtering refers to filtering out products that dont match. The tags that are identified are used to filter results.
The query is run through a bunch of canary query classifiers. The resulting class is then annotated with the filter specifications and then search engine is searched with that query.
Annotations involve identifying phrases, like dress belt is a phrase and would take a different meaning if dress and belt were split. ‘Dress with belt’ will match to dress belt as result of this annotation. Identifying phrases is quite interesting and I think the slides capture it well. (Link to slides when its available). I liked how she explained lexical datasets and reformulation based on confidence between phrases. They use frequency, length boosting and lexical boosting to identify phrase boundaries.
I also was happy that she talked a little about their A/B testing and the control – treatment dataset. I wonder what their metrics are.
Its interesting how they use user clicks into increasing confidence of those search results.