About the job
Job Description:
Do you relish the opportunity to spearhead the development of data architecture from the ground up? Do you thrive in an environment where you can take the reins and drive the growth of a data function from the ground up? If so, we’d love for you to join our data team as a data engineer intern
Key Responsibilities:
Web scraping:
- Efficient scraping of JS rendered sites.
- Understanding of web components, API Interactions and basic web structure.
- Technical knowledge: Node JS, Javascript, Scraping technologies like Scrapy, Playwright, Selenium.
Data Manipulations:
- Data cleaning and manipulation acumen. Handling various file formats – CSV, Json, HTML.
- Technical knowledge: Pyspark, Pandas, HTML.
DB Management:
- Familiarity with Mongo and Elastic queries.
- Ensure DB is matching industry best practices.
- Technical knowledge: Mongo, Elastic.
Code base management:
- Ensure best practices in handling the git repos and suggest improvements for growing code base.
- Technical Knowledge: GitHub, Version Management.
Good to have:
- NLP – Text mining
- Knowledge of statistical models, LLM.
Qualifications:
- Bachelor’s degree in Computer Science
- 1-2 years of relevant experience
- Proven experience automating workflows and processes, particularly in Google Sheets.
- Strong proficiency in SQL for querying databases like Redshift or similar.
- Proficiency in front-end visualization tools
- Experience with data acquisition, cleansing, and enrichment techniques.
- Familiarity with data warehousing and ETL processes.
- Proficiency in a programming language (Python, JavaScript, etc.) for scripting and automation.
- Ability to work collaboratively in cross-functional teams and communicate technical concepts to non-technical stakeholders.
Must-have:
HTML, Python, Node JS, JavaScript, Mongo, Elastic, Selenium, Playwright, Scrapy, Github
Nice-to-Have:
NLP – Text mining
Knowledge of statistical models, LLM.