I'm experiencing significant latency using the assistants API with retrieval. Responses range from 3 seconds to 15+ seconds.
I have two questions for the OpenAPI team:
- Which model provides faster average response times: gpt-3.5-turbo-1106 or gpt-4-preview-1106?
- Does moving from Usage Tier 3 to Tier 4 help reduce latency?
I've referred to the documentation on production best practices and latency optimization, but I'm still seeking guidance on optimizing model choice and understanding the impact of usage tiers.
Thank you for your assistance.