I came across this rather cool hack sometime ago but only now did i finally get a chance to blog about it. When you are dumped with massive Excel files which contain lots of business data that users send with an alarm that they want to “upload” this data into the databases, you are kinda flummoxed and dread the long hours trying to clean up the Excel files, checking any referential integrity problem that might fubar everything, this little handy trick would certainly bring some relief. Thanks Craig for helping me on this one! In this post, i have focused only on data INSERTION, but I am pretty confident all sorts of queries can be whipped out in this way.
Right, so lets say you have a database with following structure:
Pretty much standard stuff and very simple one too, you have a Names table which contains a set of names associated with some main categories and some sub categories. Main and Sub Categories are in their own tables with SubCategory having many-to-one relation with MainCategory. Updating this kinda multi-relation table structure can be complex compared to inserting into just one atomic table. Now lets look at the Excel file structure:
I have made some assumptions here, for e.g. the names in the MainCategory and SubCategory are unique, afterall whats the point of having 2 main categories with same name, right? You may have 2 sub categories with same names, but as we will see that might cause conflicts. To start with generating SQL, I will first populate the Main Category table as it does not have any refrential dependency on any other table. So I copy column B onto a new sheet, remove all syntactic duplicates using Data -> RemoveDuplicates in the Excel tool bar and then I would actually write inline statements on the formula bar like so:
As you can see, most of the text is just literal statement with cell number’s address inserted where the values should go in the statement. Click and drag the contents of cell B1 until B2 or upto whatever point there is data, alternatively, you can double click on the “+” icon in the lower right corner of cell B1 and that will automatically copy the statements against every row of data with the right name inserted (as evident from the image above).
Next, I take up the SubCategory data and repeat the same process this time sandwiching a select statement to get the Id from the MainCategory table corresponding to the category name (see the importance of having unique names now?). The following image shows the SQLs for SubCategory:
Finally, I formulate the SQL for my Names table like so, this time sandwiching two select statements to fetch the Main Category id and Sub Category id based on the names of the categories:
For clarity sake, this is what I had formulated in the formula bar for the Names table:
=”insert into Names values(‘”&A2&”‘,(select MainCategory.Id from MainCategory where MainCategory.Name='”&B2&”‘),(select SubCategory.Id from SubCategory where SubCategory.Name='”&C2&”‘))”
Repeating the same copy and paste process, one can automatically generate SQL statements for any number of records. Now all that remains is copying and pasting these statements in SQL Server Management studio and running them, which i am sure I don’t know need to show how. That’s all. I didn’t try this technique with other databases like MySql etc, but i am sure it will work just as smoothly. Keep in mind the relationships of the tables in the database and what type of data is being inserted, be aware of single-quotes and apostrophes and pretty much everything you would take care while manually writing out SQLs. Hopefully, this neat trick would save you a lot of repetitive manual labor and speed up your work.